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5 RECOMBINANT PROTEIN PRODUCTION IN BOVINE ADENOVIRUS 

EXPRESSION VECTOR SYSTEM 

Technical Field 

The present invention relates novel bovine 

10 adenovirus (BAV) expression vector systems in which 
one or both of the early region 1 (El) and the early 
region 3 (E3) gene deletions are replaced by a foreign 
gene and novel recombinant mammalian cell lines stably 
transformed with BAV El sequences, and therefore, 

15 expresses El gene products, to allow a bovine 

adenovirus with an El gene deletion replaced by a 
foreign gene to replicate therein. These materials 
are used in production of recombinant BAV expressing 
heterologous (antigenic) polypeptides or fragments for 

20 the purpose of live recombinant virus or subunit 
vaccines or for other therapies. 

Background of the Invention 

The adenoviruses cause enteric or 

25 respiratory infection in humans as well as in domestic 
and laboratory animals. 

The bovine adenoviruses (BAVs) comprise at 
least nine serotypes divided into two subgroups. 
These subgroups have been characterized based on 

30 enzyme-linked immunoassays (ELISA) f serologic studies 
with immunofluorescence assays, virus-neutralization 
tests, immunoelectron microscopy, by their host 
specificity and clinical syndromes. Subgroup 1 
viruses include BAV 1, 2, 3 and 9 and grow relatively 

35 well in established bovine cells compared to subgroup 
2 which includes BAV 4, 5, 6, 7 and 8. 

BAV3 was first isolated in 1965 and is the 
best characterized of the BAV genotypes and contains a 
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genome of approximately 35 kb (Kurokawa et al (1978) 
J. Virol. 2.8:212-218) . The locations of hexon (Hu et 
al (1984) J. Viol. 49:604-608) and proteinase (Cai et 
al., (1990) Nuc. Acids Res. , 18:5568), genes in the 
BAV3 genome have been identified and sequenced. 
However, the location and sequences of other genes 
such as early region 1 (El) and 3 (E3) in the BAV 
genome have not been reported. 

In the human adenovirus (HAd) genome there 
are two important regions: El and E3 in which foreign 
genes can be inserted to generate recombinant 
adenoviruses (Berkner and Sharp (1984) Nuc. Acid Res., 
12:1925-1941 and Haj-Ahmad and Graham (1986) 
Virol . , 57:267-274). El proteins are essential for 
15 virus replication in tissue culture, however, 

conditional-helper adenovirus recombinants containing 
foreign DNA in the El region, can be generated in a 
cell line which constitutively expresses El (Graham et 
al., (1977) J. Gen. Virol. , 36:59-72). In contrast, 
2 0 E3 gene products of HAd 2 and HAd 5 are not required 
for in vitro or in vivo infectious virion production, 
but have an important role in host immune responses to 
virus infection (Andersson et al (1985) Cell 43:215- 
222; Burgert et al (1987) EMBO J. 6:2019-2026; Carlin 
25 et al (1989) Cell 57:135-144; Ginsberg et al (1989) 
PNAS, USA 86:3823-3827; Gooding et al (1988) Cell 
53:341-346; Tollefson et al (1991) J. Virol. 65:3095- 
3105; Wold and Gooding (1989) Mol- Biol. Med. 6:433- 
452 and Wold and Gooding (1991) Virology 184:1-8). 
30 The E3-19kiloDalton (JcDa) glycoprotein (gpl9) of human 
adenovirus type 2 (HAd2) binds to the heavy chain of a 
number of class 1 major histocompatibility complex 
(MHC) antigens in the endoplasmic reticulum thus 
inhibiting their transport to the plasma membrane 
35 (Andersson et al. (1985) Cell 41:215-222; Burgert and 
Kvist, (1985) Cell 41:987-997; Burgert and Kvist, 
(1987) EMBO J. 6:2019-2026). The E3-14.7kDa protein 
of HAd 2 or HAd5 prevents lysis of virus-infected mouse 
cells by tumor necrosis factor (TNF) (Gooding et al. 
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infection (Darbyshire et al., 1966 Pes. Vet Sci 7:81- 
93; Mattson et al., 1988 J. Vet Res 49:67-69). BAV3 
can produce tumors when injected into hamsters 
(Darbyshire, 1966 Nature 211:102) and viral DNA can 
5 efficiently effect morphological transformation of 

mouse, hamster or rat cells in culture (Tsukamoto and 
Sugino, 1972 J. Virol. 9:465-473; Motoi et al., 1972 
Garni 63 :415-418; M. Hitt, personal communication). 
Cross hybridization was observed between BAV3 and 
10 human adenovirus type 2 (HAd2) (Hu et al., 1984 
yjrol, 49:604-608) in most regions of the genome 
including some regions near but not at the left end of 

the genome. 

The E1A gene products of the group C human 

15 adenoviruses have been very extensively studied and 
shown to mediate transactivation of both viral and 
cellular genes (Berk et al., 1979 Cell 17:935-944; 
Jones and Shenk, 1979 Cell 16:683-689; Nevins, 1981 
Cell 26:213-220; Nevins, 1982 Cell 29:913-919; 

20 reviewed in Berk, 1986 Ann. Res. Genet 20:45-79), to 

effect transformation of cells in culture (reviewed in 
Graham, F.L. (1984) "Transf ormation by and 
oncogenicity of human adenoviruses. In: The 
Adenoviruses." H.S. Ginsberg, Editor. Plenum Press, 

25 New York; Branton et al., 1985 Biochim. Biophvs. Acta 
780:67-94) and induce cell DNA synthesis and mitosis 
(Zerler et al., 1987 Mol. Cell Biol. 7:821-929; Bellet 
et al., 1989 J. Virol. 63:303-310; Howe et al., 1990 
PNAS. USA 87:5883-5887; Howe and Bayley, 1992 Virology 

30 186:15-24). The El A transcription unit comprises two 
coding sequences separated by an intron region which 
is deleted from all processed E1A transcripts. In the 
two largest mRNA species produced from the E1A . 
transcription unit, the first coding regions is 

35 further subdivided into exon 1, a sequence found in 
both the 12 s and 13s mRNA species, and the unique 
region, which is found only in the 13s mRNA species. 
By 
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coroparisons between E1A proteins of human and simian 
adenoviruses three regions of somewhat conserved 
protein 

sequence (CR) have been defined (Kimelman et al., 1985 

5 J. Virol. 53:399-409). CR1 and CR2 are encoded in 
exon 1, while CR3 is encoded in the unique sequence 
and a small portion of exon 2. Binding sites for a 
number of cellular proteins including the 
retinoblastoma protein Rb, cyclin A and an associated 

10 protein kinase ^33^, and other, as yet unassigned, 
proteins have been defined in exon 1 encoded regions 
of E1A proteins (Yee and Branton, 1985 Virology 
147 :142-153; Harlow et al., 1986 Mol. Cell Biol. 
6:1579-1589; Barbeau et al., 1992 Biochem. Cell Biol. 

15 70:1123-1134). Interaction of E1A with these cellular 
proteins has been implicated as the mechanism through 
which E1A participates in immortalization and 
oncogenic transformation (Egan et al, 1989 Oncogene 
1:383-388; Whyte et al., 1988 Nature 334:124-129; 

20 Whyte et al, 1988 J. Virol. 62:257-265). While E1A 
alone may transform or immortalize cells in culture, 
the coexpression of both E1A and either the ElB-19k 
protein or the ElB-55k protein separately or together 
is usually required for high frequency transformation 

25 of rodent cells in culture (reviewed in Graham, 1984 
supra; Branton et al., 1985 supra; McLorie et al., 
1991 rr. Gen Virol. 72:1467-1471). 

Transactivation of other viral early genes 
in permissive infection of human cells is principally 

30 mediated by the amino acid sequence encoded in the CR3 
region of E1A (Lillie et al., 1986 Cell 46:1043-1051). 
Conserved cysteine residues in a CysX 2 CysX 13 CysX 2 Cys 
sequence motif in the unique region are associated 
with metal ion binding activity (Berg, 1986 supra) and 

35 are essential for transactivation activity (Jelsma 

et al., 1988 Virology 163:494-502; Culp et al., 1988 
PNAS. USA 85:6450-6454). As well, the amino acids in 
CR3 which are immediately amino (N) -terminal to the 
metal binding domain have been shown to be important 
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in transcription activation, while those immediately 
carboxy (C) -terminal to the metal binding domain are 
important in forming associations with the promoter 
region (Lillie and Green, 1989 Nature 338 :39-44? see 
5 Fig- 3) . 

The application of genetic engineering has 
resulted in several attempts to prepare adenovirus 
expression systems for obtaining vaccines. Examples 
of such research include the disclosures in U.S. 

10 patent 4,510,245 on an adenovirus major late promoter 
for expression in a yeast host; U.S. patent 4,920,209 
on a live recombinant adenovirus type 7 with a gene 
coding for hepatitis-B surface antigen located at a 
deleted early region 3; European patent 389 286 on a 

15 non-defective human adenovirus 5 recombinant 

expression system in human cells for HCMV major 
envelope glycoprotein; WO 91/11525 on live non- 
pathogenic immunogenic viable canine adenovirus in a 
cell expressing Ela proteins; French patent 2 642 767 

20 on vectors containing a leader and/or promoter from 
the E3 of adenovirus 2. 

The selection of a suitable virus to act as 
a vector for foreign gene expression, and the 
identification of a suitable non-essential region as a 

25 site for insertion of the gene pose a challenge. In 
particular, the insertion site must be non-essential 
for the viable replication of the virus and its 
effective operation in tissue culture and also in 
vivo . Moreover, the insertion site must be capable of 

30 accepting new genetic material, whilst ensuring that 

the virus continues to replicate. An essential region 
of a virus genome can also be utilized for foreign 
gene insertion if the recombinant virus is grown in a 
cell line which complements the function of that 

35 particular essential region in trans . 

The present inventors have now identified 
suitable regions in the BAV genome and have succeeded 
in inserting foreign genes to generate BAV 
recombinants - 
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Disclosure of the Invention 

The present invention relates to novel 
bovine adenovirus expression vector systems in which 
part or all of one or both of the El and E3 gene 
5 regions are deleted and to recombinant mammalian cell 
lines of bovine origin transformed with the BAV El 
sequences, and thus, constitutively express the El 
gene products to allow bovine adenovirus, having a 
deletion of part or all of the El gene region replaced 
10 by a heterologous nucleotide sequence encoding a 

foreign gene or fragment thereof, to replicate therein 
and use of these materials in production of 
heterologous (antigenic) polypeptides or fragments 
thereof • 

15 The invention also related to a method of 

preparing a live recombinant virus or subunit vaccines 
for producing antibodies or cell mediated immunity to . 
an infectious organism in a mammal, such as bovine, 
which comprises inserting into the bovine adenovirus 

20 genome the gene or fragment coding for the antigen 

which corresponds to said antibodies or induces said 
cell mediated immunity, together with or without an 
effective promoter therefore, to produce BAV 
recombinants • 

25 Generally, the foreign gene construct is 

cloned into a nucleotide sequence which represents 
only a part of the entire viral genome having one or 
more appropriate deletions. This chimeric DNA 
sequence is usually present in a plasmid which allows 

30 successful cloning to produce many copies of the 

sequence. The cloned foreign gene construct can then 
be included in the complete viral genome, for example, 
by in vivo recombination following a DNA-mediated 
cotransfection technique. Multiple copies of a coding 

35 sequence or more than one coding sequences can be 

inserted so that the recombinant vector can express 
more than one foreign protein. The foreign gene can 
have additions, deletions or substitutions to enhance 
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expression and/or immunological effects of the 
expressed protein. 

The invention also includes an expression 
system comprising an bovine adenovirus expression 
5 vector wherein heterologous nucleotide sequences with 
or without any exogenous regulatory elements, replace 
the El gene region and/or part or all of the E3 gene 
region. 

The invention also includes (A) a 

10 recombinant vector system comprising the entire BAV 
DNA and a plasmid or two plasmids capable of 
generating a recombinant virus by in vivo 
recombination following cotransf ection of a suitable 
cell line comprising BAV DNA representing the entire 

15 wild-type BAV genome and a plasmid comprising a bovine 
adenovirus left or right end sequences containing the 
El or E3 gene regions, respectively, with a 
heterologous nucleotide sequence encoding a foreign 
gene or fragment thereof substituted for part or all 

20 of the El or E3 gene regions; (B) a live recombinant 
bovine adenovirus vector (BAV) system selected from 
the group consisting of: (a) a system wherein part or 
all of the El gene region is replaced by a 
heterologous nucleotide sequence encoding a foreign 

25 gene or fragment thereof; (b) a system wherein a part 
or all of the E3 gene region is replaced by a 
heterologous nucleotide sequence encoding a foreign 
gene or fragment thereof; and (c) a system wherein 
part or all of the El gene region and part or all of 

30 the E3 gene region are deleted and a heterologous 
nucleotide sequence encoding a foreign gene or 
fragment thereof is inserted into at least one of the 
deletions;. (C) a recombinant bovine adenovirus (BAV) 
comprising a deletion of part or all of El gene 

35 region, a deletion of part or all of E3 gene region or 
deletion of both, and inserted into at least one 
deletion a heterologous nucleotide sequence coding for 
an antigenic determinant of a disease causing 
organism; (D) a recombinant bovine adenovirus 
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expression system comprising a deletion of part or all 
of El, a deletion of part or all of E3, or both 
deletions , and inserted into at least one deletion a 
heterologous nucleotide sequence coding for a foreign 

5 gene or fragment thereof under control of an 

expression promoter: or (E) a recombinant bovine 
adenovirus (BAV) for producing an immune response in a 
mammalian host comprising: (1) BAV recombinant 
containing a heterologous nucleotide sequence coding 

10 for an antigenic determinant needed to obtain the 

desired immune response in association with or without 
(2) an effective promoter to provide expression of 
said antigenic determinant in immunogenic quantities 
for use as a live recombinant virus or recombinant 

15 protein or subunit vaccine; (F) a mutant bovine 

adenovirus (BAV) comprising a deletion of part or all 
of El and/or a deletion of part or all of E3. 

Recombinant mammalian cell lines stably 
transformed with BAV El gene region sequences, said 

20 recombinant cell lines thereby capable of allowing 

replication therein of a bovine adenovirus comprising 
a deletion of part or all of the El or E3 gene regions 
replaced by a heterologous or homologous nucleotide 
sequence encoding a foreign gene or fragment thereof. 

25 The invention also includes production, isolation and 
purification of polypeptides or fragments thereof, 
such as growth factors, receptors and other cellular 
proteins from recombinant bovine cell lines expressing 
BAV El gene products. 

30 The invention also includes a method for 

providing gene therapy to a mammal in need thereof to 
control a gene deficiency which comprises 
administering to said mammal a live recombinant bovine 
adenovirus containing a foreign nucleotide sequence 

35 encoding a non-defective form of said gene under 

conditions wherein the recombinant virus vector genome 
is incorporated into said mammalian genome or is 
maintained independently and extrachromosomally to 
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claimed herein is intended to cover all of these 
bovine adenovirus types. 



Brief Description of the Drawings 
5 Figure 1 . Sequence and major open reading 

frames of the left 11% of the BAV3 genome. The region 
comprises the El and protein IX transcription region. 
The 195 nucleotide inverted terminal repeat sequence 
identified by Shinagawa et al., 1987 Gene 55:85-93 is 

10 shown in italics. The amino acid sequence for the 

largest E1A protein, two E1B proteins and protein IX 
are presented. The probable splice donor ([), splice 
acceptor (]) and intron sequence { underlined italics ) 
within the E1A region are marked. A 35 base pair 

15 repeat sequence between E1A and E1B is indicated in 

bold underline . Possible transcription promoter TATA 
sequences and possible poly A addition sequences AATAA 
are also indicated. 

Figure 2 . Regions of homology in the E1A 

2 0 proteins of BAV3 and human adenovirus type 5 (HAd5) . 
The amino acid residue of each serotype is indicated. 
A. Conserved region 3 (CR3) of HAd5 subdivided into 
three functional regions as defined by Lillie et al 
(1989) Nature 338 :39-44 and described in the 

25 Background of the Invention. The intron sequence of 
BAV3 E1A occurs within the serine amino acid codon at 
position 204. B. A portion of conserved region 2 
(CR2) of HAd5, showing the residues thought to be 
important in the binding of retinoblastoma protein Rb 

30 (Dyson et al., 1990 J. Virol. 64:1353-1356), and the 
comparable sequence from BAV3. 

Figure 3 . Homology regions between the HAd5 
and E1B 19k (176R) protein and the corresponding BAV3 
(157R) protein. The amino acid residue number for 

35 each of the viruses is indicated. 

Figure 4 . The C-terminal 346R of HAd5 
E1B 56k (496R) and the corresponding BAV3 protein 
(42 OR) . The HAd5 protein comparison begins at 
residue 150 and the BAV3 (in italics) at residue 74. 



WO 95/16048 PCT/CA94/00678 

-12- 

The amino terminal regions of these proteins which are 
not presented show no significant homology. 

Figure 5 . Homology comparison of the amino 
acid sequence of HAd5 protein IX and the corresponding 
5 protein of BAV3 (in italics) . 

Figure 6 » The genome of BAV3 showing the 
location of EcoKl, XJbal and BAMHX sites and the 
structure of the 5100bp segment from 77 to 92 m.u. 
ORFs for the upper strand which can encode 60 amino 

10 acids or more are represented by bars. Shaded 

portions indicate regions of similarity to pVIII, 
14. 7K E3 and fibre proteins of HAd2 or -5. The first 
methionine followed by a stretch of amino acids of at 
least 50 is shown by an open triangle. Termination 

15 codons for ORFs likely to code for viral proteins are 
shown by closed triangles. 

Figure 7 . Nucleotide sequence of BAV3 
between 77 and 92 m.u. showing ORFs that have the 
potential to encode polypeptides of at least 50 amino 

20 acids after the initiating methionine. The nucleotide 
sequence was analyzed using the program DISPCOD 
(PC/ GENE) . Potential N-glycosylation sites (N-X-T/S) 
and polyadenylation signals are underlined and the 
first methionine of each 0RF is shown in bold. 

25 Figure 8 . Comparison between the predicted 

amino acid sequences for the ORFs of BAV3 and known 
proteins of HAd2 or -5 using the computer program 
PALIGN (PC/GENE), with comparison matrix structural- 
genetic matrix; open gap cost 6; unit gap cost 2. 

30 Identical residues are indicated by a colon and 

similar residues by a dot. (a) Comparison between the 
predicted amino acid sequence encoded by the 3' end of 
BAV3 ORF 1 and the HAd2 hexon-associated pVIII 
precursor. (b) Comparison between the ORF 4 and the 

35 HAd5 14. 7K E3 protein. (c) Comparison between the 
predicted amino acid sequence encoded by BAV3 ORF 6 
and the HAd2 fibre protein. 

Figure 9 . Construction of BAV3 E3 transfer 
vector containing the firefly lucif erase gene. The 
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3.0 kb BamHI 'D' fragment of the BAV3 genome which 
falls between m.u. 77.8 and 86.4, contains almost the 
entire E3 region (Mittal et al (1992) J. Gen. Virol. 
73:3295-3000). This 3.0 kb fragment was isolated by 
5 digesting BAV3 DNA with BamHI and cloned into pUC18 at 
the BamHI site to obtain pSM14. Similarly, the 4 . 8 kb 
BamHI 'C fragment of BAV3 DNA which extends between 
m.u. 86.4 and 100 was isolated and inserted into pUC18 
to produce pSM17. To delete a 696 bp Xhol-Ncol 
10 fragment, pSM14 was cleaved with Xhol and Ncol, the 
larger fragment was purified and the ends were made 
blunt with Klenow fragment of DNA 

polymerase I and a Nrul-Sall linker was inserted to 
generate pSM14del2. A 2.3 kb BamHI fragment 

15 containing BAV3 sequences, an E3 deletion and Nrul and 
Sail cloning sites, was inserted into pSM17 at the 
BamHI site to obtain pSM41, however, this step was not 
required for construction of a BAV3 E3 transfer 
vector. A 1716 bp fragment containing the firefly 

20 luciferase gene (de Wet et al (1987) Mol. Cell. Biol. 
7:725-737) was isolated by digesting pSVOA/L (provided 
by D. R- Helinski, University of California at San 
Diego, CA) with BsmI and Sspl as described (Mittal et 
al (1993) Virus Res. 28:67-90), and the ends were made 

25 blunt with Klenow. The luciferase gene was inserted 
into pSM41 at the Sail site by blunt end ligation. 
The resultant plasmid was named pSM41-Luc which 
contained the luciferase gene in the same orientation 
as the E3 transcription unit. The plasmid pKN30 was 

30 digested with Xbal and inserted into pSM41-Luc 

(partially cleaved with Xbal) at a Xbal site present 
within the luciferase gene to obtain pSM4 1-Luc-Kan . 
The plasmid pSM14 was digested with BamHI and a 3.0 kb 
fragment was isolated and inserted into pSM17 at the 

35 BamHI site to generate pSM43. The 18.5 kb Xbal 'A' 
fragment of the BAV3 genome which falls between 
m.u. 31.5 and 84.3 was cloned into pUC18 at the Xbal 
site to result pSM21. A 18.5 kb Xbal fragment was 
purified from pSM21 after cleavage with Xbal and 
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inserted into pSM43 at the Xbal site and the resultant 
plasmid was named pSM51. A 

7.7 Kb BamHI fragment containing the lucif erase gene 
and kan r gene was isolated after digesting pSM41-Luc- 
5 Kan with BamHI and ligated to pSM51, partially 

digested with BamHI, to isolate pSM51-Luc-Kan in the 
presence of ampicillin and kanamycin. Finally the kan r 
gene was deleted from pSM51-Luc-Kan by partial 
cleavage with Xbal and religation to obtain pSM51-Luc. 

10 Figure 10. Generation of BAV3 recombinants 

containing the firefly lucif erase in the E3 region. 
The plasmid pSM51-Luc contains the BAV3 genome between 
m.u. 77.8-84.3 and 31.5-100, a 696 bp deletion in E3 
and the luciferase gene in E3 in the E3 parallel 

15 orientation. The BAV3 genome digested with Pvul and 
uncut pSM51-Luc were used for cotransf ection of MDBK 
cells transformed with a plasmid containing BAV3 El 
seguences to rescue the luciferase gene in E3 of the 
BAV3 genome by in vivo recombination- The resulting 

20 BAV3— lucif erase recombinants (BAV3-Luc) isolated from 
two independent experiments were named BAV3-Luc (3.1) 
and BAV3— Luc (3.2). The BamHI restriction map of the 
BAV3-Luc genome is shown. The position and 
orientation of the firefly luciferase gene is shown as 

25 a hatched arrow. 

Figure 11 . Southern blot analyses of 
restriction enzymes digested DNA fragments of the wt 
BAV3 or recombinant genomes by using a 696 bp Xhol- 
Ncol fragment from pSM14 (Fig. 9) and a DNA fragment 

30 containing the luciferase gene as probes. 100 ng DNA 
isolated from the mock (lanes 1, 2, 3), BAV3-Luc (3.1) 
(lanes 4, 5, 6), BAV3-LUC (3.2) (lanes 7, 8, 9) or wt 
BAV3 (lanes 10, 11 12) -infected MDBK cells were 
digested with BamHI (lanes 1, 4, 7, 10), EcoRI (lanes 

35 2, 5, 8, 11) or Xbal (lanes 3, 6, 9, 12) and analyzed 
by agarose gel electrophoresis. The DNA fragments 
from the gel were transferred onto a GeneScreenPlus™ 
membrane and hybridized with a 696 bp Xhol-Ncol 
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fragment from pSM14 (Fig. 9) labeled with 32 P using 
Pharmacia Oligolabeling 

Kit (panel A). Panel B blot represents duplicate 
samples as in panel A but was probed with a 1716 bp 
5 Bsml-Sspl fragment containing the luciferase gene 
(Fig. 9) . The sizes of bands visualized following 
hybridization are shown in kb on the right in panel A 
and on the left in panel B. 

B: BamHI, E: EcoRI, Xb: Xbal, 3.1: BAV3-Luc (3.1), 

10 3.2: BAV3 —Luc (3.2) and wt: wild-type BAV3 . 

Figure 12 . Single step growth curve for wt 
BAV3 and BAV3-Luc. Confluent monolayers of MDBK cells 
in 25 mm multi-well culture plates were inoculated 
with the wt BAV3 , BAV3-Luc (3.1) or BAV3-Luc (3.2) at 

15 a m.o.i. of 10 p.f.u. per cell. The virus was allowed 
to adsorb for 1 h at 37°C / cell monolayers were washed 
3 times with PBS ++ (0.137 M NaCl, 2.7 rtM KC1, 8 mM 
Na 2 HP0 4 , 1.5 mM KH 2 P0 4 , containing 0.01% CaCl 2 .2H 2 0 & 
0.01% MgCl 2 .6H 2 0) and incubated at 37°C in 1 ml 

20 maintenance medium containing 2% horse serum. At 
various times post-infection, cells were harvested 
along with the supernatant, frozen and thawed three 
times and titrated on MDBK cells by plague assay. 
Results are the means of duplicate samples. 

25 Figure 13 . Kinetics of luciferase 

expression in MDBK cells-infected with BAV3-Luc. 
Confluent MDBK cell monolayers in 25 mm multi-well 
culture plates were infected with BAV3-Luc (3.1) or 
BAV3-LUC (3.2) at a m.o.i. of 50 p.f.u. per cell. At 

30 indicated time points post-infection, virus-infected 
cells were harvested and assayed in duplicate for 
luciferase activity. 

Figure 14 . Luciferase expression in the 
presence of 1-jS-D-arabinof luranosyl cytosine (AraC) in 

35 MDBK cells-infected with BAV3-Luc. Confluent MDBK 
cell monolayers in 25 mm multi-well culture plates 
were infected with A) BAV3-Luc (3.1) or B) BAV3-Luc 
(3.2) at a m.o.i. of 50 p.f.u. per cell and incubated 
in the absence or presence of 50 /xg AraC per ml of 
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maintenance medium. At indicated time points post- 
infection, virus-infected cells were harvested and 
assayed in duplicate for lucif erase activity. 

Figure 15 . Transcription maps of the wt 

5 BAV3 and BAV3-Luc genomes in the E3 region. The 
genome of wt BAV3 between m.u. 77 and 82 is shown 
which represents the E3 region. The location of Xhol 
and Ncol sites which were used to make an E3 deletion 
are shown. (a) The three frames (Fl, F2 and F3) 

10 representing the open reading frames (ORFs) in the 
upper strand of the wt BAV3 genome in the E3 region 
are represented by bars. The shaded portions indicate 
regions of similarities to pVIII and E3-14.7 kDa 
proteins of HAd5. The positions of the initiation and 

15 termination codons for ORFs likely to code for viral 
proteins are shown by open and closed triangles, 
respectively. (b) The predicted ORFs for the upper 
strand in E3 of the BAV3-Luc genome are shown after a 
696 bp Xhol-Ncol E3 deletion replaced by the 

20 lucif erase gene. The ORFs for pVIII and E3-14.7 kDa 
proteins are intact. The transcription map of the wt 
BAV3 E3 was adapted from the DNA sequence submitted to 
the GenBank database under accession number D16839. 

Figure 16 . Western blot analysis of virus- 

25 infected MDBK cells using an anti-lucif erase antibody. 
Confluent monolayers of MDBK cells were mock-infected 
(lane 1) or infected with the wt BAV3 (lane 2) , BAV3- 
Luc (3.1) (lane 3) and BAV3-Luc (3.2) (lane 4) at a 
m.o.i. of 50 p.f.u. per cell, harvested at 18 h post- 
30 infection, cell extracts prepared and analyzed by SDS- 
PAGE and Western blotting using a rabbit anti- 
lucif erase antibody. Purified firefly lucif erase was 
used as a positive control (lane 5) . The lane 5 was 
excised to obtain a shorter exposure. The protein 

35 molecular weight markers in kDa are shown on the left. 
The arrow indicates the 62 kDa luciferase bands 
reacted with the anti-lucif erase antibody, 
wt: wild-type BAV3, 3.1: BAV3-Luc (3.1) and 3.2: BAV3- 
Luc (3.2). 



WO 95/16048 PCT/CA94/00678 

-17- 

Fiqure 17 . Construction of pSM71-neo. A 
8.4 kb Sail fragment of the BAV3 genome which falls 
between m.u. 0 and 24 was isolated and inserted into 
pUC19* at the Sall-Smal site to generate pSM71. The 
5 plasmid, pRSDneo (Fitzpatrick et al (1990) Virology 

176 : 145-157) contains the neomycin-resistant (neo r gene 
flanked with the simian virus 40 (SV40) regulatory 
seguences originally from the plasmid, pSV2neo 
(Southern et al (1982) J, Mol. Appl. Genet 1:327-341) 

10 after deleting a portion of the SV40 sequences 

upstream of the neo r gene to remove several false 
initiation codons. A 2.6 kb fragment containing the 
neo r gene under the control of the SV40 regulatory 
sequences, was obtained from the plasmid, pRSDneo 

15 after digestion with BamHI and Bglll, and cloned into 
pSM71 at the Sail site by blunt end ligation to obtain 
pSM71-neo containing the neo r gene in the El parallel 
orientation. 

Figure 18 . Construction of pSM61-kan 1 and 

20 pSM61-kan2. A 11.9 kb Bglll fragment of the BAV3 
genome which extends between m.u. 0 and 34 was 
purified and introduced into pUC19 at the BaraHI-HincIl 
site to obtain pSM61. The plasmid, pKN30 contains the 
neo f gene along with SV40 promoter and polyadenylation 

25 sequences from the plasmid pSV2neo without any 

modification. The entire pKN30 plasmid was inserted 
into pSM61 at the Sail site to generate pSM61-kanl 
having the neo r gene in the El anti-parallel 
orientation and pSM61-kan2 when the neo r gene is in the 

30 El parallel orientation. 

Figure 19 . Construction of an El transfer 
plasmid containing the beta-galactosidase gene. 

The plasmid, pSM71 which contains the BAV3 
genome between m.u. 0 and 24, was cleaved with Clal 

35 and partially with AVrll to delete a 2.6 kb Avrll-Clal 
fragment (between m.u. 1.3 and 8.7) which falls within 
the El region. A 0.5 kb fragment containing the SV40 
promoter and polyadenylation sequences was obtained 
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from pFG144K5-SV by digesting with Xbal and inserted 
into pSM71 to replace the 2.6 kb deletion to generate 
pSM71-dell-SV. A 3.26 kb fragment containing the 
bacterial beta-galactosidase gene was isolated from 
5 pDUC/Z (Liang et al (1993) Virology 195 :42-50) after 
cleavage with Ncol and Hindlll and cloned into pSM71- 
dell-SV at the BamHI site to put the beta- 
galactosidase gene under the control of the SV40 
regulatory sequences to obtain pSM7l-Z. 

10 

Modes of Carrying Out the Invention 

The practice of the present invention will 
employ, unless otherwise indicated, conventional 
microbiology, immunology, virology, molecular biology, 

15 and recombinant DNA techniques which are within the 
skill of the art. These techniques are fully 
explained in the literature. See , e.g. , Maniatis et 
al. f Molecular Cloning: A Laboratory Manual (1982); 
DNA Cloning: A Practical Approach , vols. I & II (D. 

20 Glover, ed.); Oligonucleotide Synthesis (N. Gait, ed. 
(1984)); Nucleic Acid Hybridization (B. Hames & S. 
Higgins, eds. (1985) ): Transcription and Translation 
(B. Hames & S. Higgins, eds. (1984)); Animal Cell 
Culture (R. Freshney, ed, (1986)); Perbal, A Practical 

25 Guide to Molecular Cloning (1984). Sambrook et al., 

Molecular Cloning: A Laboratory Manual (2nd Edition); 
vols. I, II & III (1989). 

A. Definitions 
30 In describing the present invention, the 

following terminology, as defined below, will be used. 

A "replicon" is any genetic element (e.g., 
plasmid, chromosome, virus) that functions as an 
35 autonomous unit of DNA replication in vivo ; i.e., is 
capable of replication under its own control. 

A "vector" is a replicon, such as a plasmid, 
phage, cosraid or virus, to which another DNA segment 
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may be attached so as to bring about the replication 
of the attached segment. 

By "live virus" is meant, in 
contradistinction to "killed" virus, a virus which is 
5 capable of producing identical progeny in tissue 
culture and inoculated animals. 

A "helper-free virus vector" is a vector 
that does not require a second virus or a cell line to 
supply something defective in the vector. 

10 A "double-stranded DNA molecule" refers to 

the polymeric form of deoxyribonucleotides (adenine, 
guanine, thymine, or cytosine) in its normal, double- 
stranded helix. This term refers only to the primary 
and secondary structure of the molecule, and does not 

15 limit it to any particular tertiary forms. Thus, this 
term includes double-stranded DNA found, inter alia, 
in linear DNA molecules (e.g., restriction fragments 
of DNA from viruses, plasmids, and chromosomes) . In 
discussing the structure of particular double-stranded 

20 DNA molecules, sequences may be described herein 

according to the normal convention of giving only the 
sequence in the 5' to 3' direction along the 
nontranscribed strand of DNA (i.e., the strand having 
the sequence homologous to the mRNA) . 

25 A DNA "coding sequence" is a DNA sequence 

which is transcribed and translated into a polypeptide 
in vivo when placed under the control of appropriate 
regulatory sequences. The boundaries of the coding 
sequence are determined by a start codon at the 5 ' 

30 (amino) terminus and a translation stop codon at the 

3 f (carboxy) terminus. A coding sequence can include, 
but is not limited to, procaryotic sequences, cDNA 
from eucaryotic mRNA, genomic DNA sequences from 
eucaryotic (e.g., mammalian) DNA, viral DNA, and even 

35 synthetic DNA sequences. A polyadenylation signal and 
transcription termination sequence will usually be 
located 3' to the coding sequence. 

A "transcriptional promoter sequence" is a 
DNA regulatory region capable of binding RNA 
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polymerase in a cell and initiating transcription of a 
downstream (3' direction) coding sequence. For 
purposes of defining the present invention, the 
promoter sequence is bound at the 3' terminus by the 
5 translation start codon (ATG) of a coding sequence and 
extends upstream (5 7 direction) to include the minimum 
number of bases or elements necessary to initiate 
transcription at levels detectable above background. 
Within the promoter sequence will be found a 

10 transcription initiation site (conveniently defined by 
mapping with nuclease SI) , as well as protein binding 
domains (consensus sequences) responsible for the 
binding of RNA polymerase. Eucaryotic promoters will 
often, but not always, contain "TATA" boxes and "CAAT" 

15 boxes. Procaryotic promoters contain Shine-Dalgarno 
sequences in addition to the -10 and -35 consensus 
sequences . 

DNA "control sequences" refer collectively 
to promoter sequences, ribosome binding sites, 
20 polyadenylation signals, transcription termination 

sequences, upstream regulatory domains, enhancers, and 
the like, which collectively provide for the 
transcription and translation of a coding sequence in 
a host cell. 

25 A coding sequence or sequence encoding is 

"operably linked to" or "under the control of" control 
sequences in a cell when RNA polymerase will bind the 
promoter sequence and transcribe the coding sequence 
into mRNA, which is then translated into the 

30 polypeptide encoded by the coding sequence. . 

A "host cell" is a cell which has been 
transformed, or is capable of transformation, by an 
exogenous DNA sequence. 

A cell has been "transformed" by exogenous 

35 DNA when such exogenous DNA has been introduced inside 
the cell membrane. Exogenous DNA may or may not be 
integrated (covalently linked) to chromosomal DNA 
making up the genome of the cell. In procaryotes and 
yeasts, for example, the exogenous DNA may be 
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maintained on an episomal element, such as a plasmid. 
A stably transformed cell is one in which the 
exogenous DNA has become integrated into the 
chromosome so that it is inherited by daughter cells 
5 through chromosome replication. For mammalian cells, 
this stability is demonstrated by the ability of the 
cell to establish cell lines or clones comprised of a 
population of daughter cell containing the exogenous 
DNA. 

10 A "clone" is a population of daughter cells 

derived from a single cell or common ancestor. A 
"cell line" is a clone of a primary cell that is 
capable of stable growth in vitro for many 
generations. 

15 two polypeptide sequences are "substantially 

homologous" when at least about 80% (preferably at 
least about 90%, and most preferably at least about 
95%) of the amino acids match over a defined length of 
the molecule. 

20 Two DNA sequences are "substantially 

homologous" when they are identical to or not 
differing in more that 40% of the nucleotides, more 
preferably about 20% of the nucleotides, and most 
preferably about 10% of the nucleotides. 

25 DNA sequences that are substantially 

homologous can be identified in a Southern 
hybridization experiment under, for example, stringent 
conditions, as defined for that particular system. 
Defining appropriate hybridization conditions is 

30 within the skill of the art. See, e.g. , Maniatis et 
al., supra; DNA Cloning , vols. I & II, supra ; ffucleic 
Acid Hybridization , supra . 

A "heterologous" region of a DNA construct 
is an identifiable segment of DNA within or attached 

35 to another DNA molecule that is not found in 

association with the other molecule in nature. Thus, 
when the heterologous region encodes a viral gene, the 
gene will usually be flanked by DNA that does not 
flank the viral gene in the genome of the source virus 
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i.e., produced from cells transformed by an exogenous 
DNA construct encoding the desired polypeptide. 

A "substantially pure" protein will be free 
of other proteins, preferably at least 10% 
5 homogeneous, more preferably 60% homogeneous, and most 
preferably 95% homogeneous. 

An "antigen" refers to a molecule containing 
one or more epitopes that will stimulate a host's 
immune system to make a humoral and/or cellular 

10 antigen-specific response. The term is also used 
interchangeably with "immunogen. " 

A "hapten" is a molecule containing one or 
more epitopes that does not stimulate a host's immune 
system to make a humoral or cellular response unless 

15 linked to a carrier. 

The term "epitope" refers to the site on an 
antigen or hapten to which a specific antibody 
molecule binds or is recognized by T cells. The term 
is also used interchangeably with "antigenic 

20 determinant" or "antigenic determinant site." 

An "immunological response" to a composition 
or vaccine is the development in the host of a 
cellular and/ or antibody-mediated immune response to 
the composition or vaccine of interest. Usually, such 

25 a response consists of the subject producing 

antibodies, B cells, helper T cells, suppressor T 
cells, and/or cytotoxic T cells directed specifically 
to an antigen or antigens included in the composition 
or vaccine of interest. 

3 0 The terms "immunogenic polypeptide" and 

"immunogenic amino acid sequence" refer to a 
polypeptide or amino acid sequence, respectively, 
which elicit antibodies that neutralize viral 
infectivity, and/or mediate antibody-complement or 

35 antibody dependent cell cytotoxicity to provide 

protection of an immunized host. An "immunogenic 
polypeptide" as used herein, includes the full length 
(or near full length) sequence of the desired protein 
or an immunogenic fragment thereof. 
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By "immunogenic fragment" is meant a 
fragment of a polypeptide which includes one or more 
epitopes and thus elicits antibodies that neutralize 
viral infectivity, and/or mediates antibody-complement 
5 or antibody dependent cell cytotoxicity to provide 

protection of an immunized host. Such fragments will 
usually be at least about 5 amino acids in length, and 
preferably at least about 10 to 15 amino acids in 
length. There is no critical upper limit to the 

10 length of the fragment, which could comprise nearly 
the full length of the protein sequence, or even a 
fusion protein comprising fragments of two or more of 
the antigens. The term "treatment" as used herein 
refers to treatment of a mammal, such as bovine or the 

15 like, either (i) the prevention of infection or 

reinfection (prophylaxis), or (ii) the reduction or 
elimination of symptoms of an infection. The vaccine 
comprises the recombinant BAV itself or recombinant 
antigen produced by recombinant BAV. 

20 By "infectious" is meant having the capacity 

to deliver the viral genome into cells. 

B. General Method 

The present invention identifies and 

25 provides a means of deleting part or all of the 

nucleotide sequence of bovine adenovirus El and/ or E3 
gene regions to provide sites into which heterologous 
or homologous nucleotide sequences encoding foreign 
genes or fragments thereof can be inserted to generate 

30 bovine adenovirus recombinants* By "deleting part of" 
the nucleotide sequence is meant using conventional 
genetic engineering techniques for deleting the 
nucleotide sequence of part of the El and/or E3 
region . 

35 Various foreign genes or coding sequences 

(prokaryotic, and eukaryotic) can be inserted in the 
bovine adenovirus nucleotide sequence, e.g.,DNA, in 
accordance with the present invention, particularly to 
provide protection against a wide range of diseases 
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and many such genes are already known in the art. The 
problem heretofore having been to provide a safe, 
convenient and effective vaccine vector for the genes 
or coding sequences . 
5 It is also possible that only fragments of 

nucleotide sequences of genes can be used (where these 
are sufficient to generate a protective immune 
response) rather than the complete sequence as found 
in the wild-type organism. Where available, synthetic 

10 genes or fragments thereof can also be used. However, 
the present invention can be used with a wide variety 
of genes, fragment and the like, and is not limited to 
those set out above. 

In some cases the gene for a particular 

15 antigen can contain a large number of introns or can 
be from an RNA virus, in these cases a complementary 
DNA copy (cDNA) can be used. 

In order for successful expression of the 
gene to occur, it can be inserted into an expression 

20 vector together with a suitable promoter including 
enhancer elements and polyadenylation sequences. A 
number of eucaryotic promoter and polyadenylation 
sequences which provide successful expression of 
foreign genes in mammalian cells and how to construct 

25 expression cassettes, are known in the art, for 

example in U.S. patent 5,151,267, the disclosures of 
which are incorporated herein by reference. The 
promoter is selected to give optimal expression of 
immunogenic protein which in turn satisfactorily leads 

30 to humoral, cell mediated and mucosal immune responses 
according to known criteria. 

The foreign protein produced by expression 
in vivo in a recombinant virus-infected cell may be 
itself immunogenic. More than one foreign gene can be 

35 inserted into the viral genome to obtain successful 
production of more than one effective protein. 

Thus with the recombinant virus of the 
present invention, it is possible to provide 
protection against a wide variety of diseases 
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affecting cattle* Any of the recombinant antigenic 
determinant or recombinant live virus of the invention 
can be formulated and used in substantially the same 
manner as described for the antigenic determinant 
5 vaccines or an live vaccine vectors. 

The antigens used in the present invention 
can be either native or recombinant antigenic 
polypeptides or fragments. They can be partial 
sequences, full-length sequences, or even fusions 

10 (e.g., having appropriate leader sequences for the 
recombinant host, or with an additional antigen 
sequence for another pathogen) . The preferred 
antigenic polypeptide to be expressed by the virus 
systems of the present invention contain full-length 

15 (or near full-length) sequences encoding antigens. 
Alternatively, shorter sequences that are antigenic 
(i.e., encode one or more epitopes) can be used. The 
shorter sequence can encode a "neutralizing epitope," 
which is defined as an epitope capable of eliciting 

20 antibodies that neutralize virus infectivity in an in 
vitro assay. Preferably the peptide should encode a 
"protective epitope" that is capable of raising in the 
host an "protective immune response;" i.e., an 
antibody- and/ or a cell-mediated immune response that 

25 protects an immunized host from infection. 

The antigens used in the present invention, 
particularly when comprised of short oligopeptides, 
can be conjugated to a vaccine carrier. Vaccine 
carriers are well known in the art: for example, 

30 bovine serum albumin (BSA) , human serum albumin (HSA) 
and keyhole limpet hemocyanin (KLH) . A preferred 
carrier protein, rotavirus VP6, is disclosed in EPO 
Pub. No. 0259149, the disclosure of which is 
incorporated by reference herein. 

35 Genes for desired antigens or coding 

sequences thereof which can be inserted include those 
of organisms which cause disease in mammals, 
particularly bovine pathogens such as bovine 
rotavirus, bovine corona virus, bovine herpes virus 
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type 1, bovine respiratory syncytial virus, bovine 
parainfluenza virus type 3 (BPI-3), bovine diarrhea 
virus, Pasteurella haemolvtica , Haemophilus somnus and 
the like. The vaccines of the invention carrying 
5 foreign genes or fragments can also be orally 

administered in a suitable oral carrier, such as in an 
enteric-coated dosage form. Oral formulations include 
such normally-employed excipients as, for example, 
pharmaceutical grades of mannitol, lactose, starch, 

10 magnesium stearate, sodium saccharin cellulose, 
magnesium carbonate, and the like. Oral vaccine 
compositions may be taken in the form of solutions, 
suspensions, tablets , pills, capsules, sustained 
release formulations/ or powders, containing from 

15 about 10% to about 95% of the active ingredient, 

preferably about 25% to about 70%. An oral vaccine 
may be preferable to raise mucosal immunity in 
combination with systemic immunity, which plays an 
important role in protection against pathogens 

20 infecting the gastrointestinal tract. 

In addition, the vaccine be formulated into 
a suppository. For suppositories, the vaccine 
composition will include traditional binders and 
carriers, such as polyalkaline glycols or 

25 triglycerides. Such suppositories may be formed from 
mixtures containing the active ingredient in the range 
of about 0.5% to about 10% (w/w) , preferably about 1% 
to about 2%. 

Protocols for administering to animals the 

30 vaccine composition (s) of the present invention are 
within the skill of the art in view of the present 
disclosure. Those skilled in the art will select a 
concentration of the vaccine composition in a dose 
effective to elicit an antibody and/or T-cell mediated 

35 immune response to the antigenic fragment. Within 
wide limits, the dosage is not believed to be 
critical. Typically, the vaccine composition is 
administered in a manner which will deliver between 
about 1 to about 1,000 micrograms of the subunit 
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antigen in a convenient volume of vehicle, e.g., about 
1-10 cc. Preferably, the dosage in a single 
immunization will deliver from about 1 to about 500 
micrograms of subunit antigen, more preferably about 
5 5-10 to about 100-200 micrograms (e.g., 5-200 
micrograms) . 

The timing of administration may also be 
important. For example, a primary inoculation 
preferably may be followed by subsequent booster 

10 inoculations if needed. It may also be preferred, 
although optional, to administer a second, booster 
immunization to the animal several weeks to several 
months after the initial immunization. To insure 
sustained high levels of protection against disease, 

15 it may be helpful to readminister a booster 

immunization to the animals at regular intervals, for 
example once every several years. Alternatively, an 
initial dose may be administered orally followed by 
later inoculations, or vice versa. Preferred 

20 vaccination protocols can be established through 
routine vaccination protocol experiments. 

The dosage for all routes of administration 
of in vivo recombinant virus vaccine depends on 
various factors including, the size of patient, nature 

25 of infection against which protection is needed, 

carrier and the like and can readily be determined by 
those of skill in the art. By way of non-limiting 
example, a dosage of between 10 3 pfu and 10 s pfu and 
the like can be used. As with in vitro subunit 

30 vaccines, additional dosages can be given as 
determined by the clinical factors involved. 

In one embodiment of the invention, a number 
of recombinant cell lines are produced according to 
the present invention by constructing an expression 

35 cassette comprising the BAV El region and transforming 
host cells therewith to provide cell lines or cultures 
expressing the El proteins. These recombinant cell 
lines are capable of allowing a recombinant BAV, 
having an El gene region deletion replaced by 
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heterologous nucleotide sequence encoding for a 
foreign gene or fragment, to replicate and express the 
desired foreign gene or fragment thereof which is 
encoded within the recombinant BAV. These cell lines 
5 are also extremely useful in generating recombinant 
BAV, having an E3 gene deletion replaced by 
heterologous nucleotide sequence encoding for a 
foreign gene or fragment, by in vivo recombination 
following DNA-mediated cotransf ection. 

10 In one embodiment of the invention, the 

recombinant expression cassette can be obtained by 
cleaving the wild-type BAV genome with an appropriate 
restriction enzyme to produce a DNA fragment 
representing the left end or the right end of the 

15 genome comprising El or E3 gene region sequences, 
respectively and inserting the left or right end 
fragment into a cloning vehicle, such as plasmid and 
thereafter inserting at least one DNA sequence 
encoding a foreign protein, into El or E3 deletion 

20 with or without the control of an exogenous promoter. 
The recombinant expression cassette is contacted with 
the wild-type BAV DNA through homologous recombination 
or other conventional genetic engineering method 
within an El transformed cell line to obtain the 

25 desired recombinant. 



system comprising an bovine adenovirus expression 
vector wherein a heterologous nucleotide, e.g. DNA, 
replaces part or all of the E3 region and/or part or 
30 all of the El region. The expression system can be 
used wherein the foreign nucleotide sequences, e.g. 
DNA, is with or without the control of any other 
heterologous promoter. 



35 of the invention transactivate most of the cellular 

genes, and therefore, cell lines which constitutively 
express El proteins can express cellular polypeptides 
at a higher level than normal cell lines. The 
recombinant mammalian, particularly bovine, cell lines 



The invention also includes an expression 



The BAV El gene products of the adenovirus 
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of the invention can be used to prepare and isolate 
polypeptides,, including those such as (a) proteins 
associated with adenovirus E1A proteins: e.g. p300, 
retinoblastoma (Rb) protein, cyclins, kinases and the 
5 like.; (b) proteins associated with adenovirus E1B 
protein: e.g. p53 and the like.; (c) growth factors, 
such as epidermal growth factor (EGF) , transforming 
growth factor (TGF) and the like; (d) receptors such 
as epidermal growth factor receptor (EGF-R) , 

10 fibroblast growth factor receptor (FGF-R) , tumor 

necrosis factor receptor (TNF-R) , insulin-like growth 
factor receptor (IFG-R) , major histocompatibility 
complex class I receptor and the like; (e) proteins 
encoded by proto-oncogenes such as protein kinases 

15 (tyrosine-specif ic protein kinases and protein kinases 
specific for serine or threonine) , p21 proteins 
(guanine nucleotide-binding proteins with GTPase 
activity and the like; (f) other cellular proteins 
such as actins, collagens, f ibronectins, integrins, 

20 phospholipids, proteoglycans, histones and the like, 
and (g) proteins involved in regulation of 
transcription such as TATA-box-binding protein (TBP) , 
TBP-associated factors (TAFs) . SP1 binding protein and 
the like. 

25 The invention also includes a method for 

providing gene therapy to a mammal in need thereof to 
control a gene deficiency which comprises 
administering to said mammal a live recombinant bovine 
adenovirus containing a foreign nucleotide sequence 

30 encoding a non-defective form of said gene under 

conditions wherein the recombinant virus vector genome 
is incorporated into said mammalian genome or is 
maintained independently and extrachromosomally to 
provide expression of the required gene in the target 

35 organ or tissue. These kinds of techniques are 

recently being used by those of skill in the art to 
replace a defective gene or portion thereof. Examples 
of foreign genes nucleotide sequences or portions 
thereof that can be incorporated for use in a 
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conventional gene therapy include, cystic fibrosis 
transmembrane conductance regulator gene, human 
minidystrophin gene, alphal-antitrypsin gene and the 
like. 

5 

Examples 

Described below are examples of the present 
invention. These examples are provided only for 
illustrative purposes and are not intended to limit 

10 the scope -of the present invention in any way. In 

light of the present disclosure, numerous embodiments 
within the scope of the claims will be apparent to 
those of ordinary skill in the art. The contents of 
the references cited in the specification are 

15 incorporated by reference herein. 



Cells and viruses 

Cell culture media and reagents were 
obtained from GIBCO/BRL Canada (Burlington, Ontario, 

20 Canada) . Media were supplemented with 25 mM Hepes and 
50 jug/ml gentamicin. MDBK cells or MDBK cells 
transformed with a plasmid containing BAV3 El 
sequences were grown in MEM supplemented with 10% 
Fetal bovine serum. The wild-type BAV3 ((strain WBR- 

25 1) (Darbyshire et al, 1965 J. Comparative Pathology 
75:327) was kindly provided by Dr. B. Darbyshire, 
University of Guelph, Guelph, Canada) and BAV3- 
luciferase recombinants working stocks and virus 
titrations were done in MDBK cells. 

30 

Enzvmes, bacteria and plasmids 

Restriction endonucleases, polymerase chain 
reaction (PCR) and other enzymes required for DNA 
manipulations were purchased from Pharmacia LKB 
35 Biotechnology (Canada) Ltd. (Dorval, Quebec, Canada), 
Boehringer-Mannheim, Inc. (Laval or Montreal, Quebec, 
Canada) , New England BioLabs (Beverly, MA), or 
GIBCO/BRL Canada (Burlington, Ontario, Canada) and 
used as per manufacturer's instructions. Restriction 
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enzyme fragments of BAV3 DNA were inserted into pUC18 
or pUC19 (Yanich-Penon et al (1985) Gene 33:103-109) 
following standard procedures (Sambrook et al (1989) 
Molecular Cloning: A Laboratory Manual, 2nd ed. Cold 
5 Spring Harbour Laboratory, New York) . E* coli strain 
DH5 (supE44 hsdRH recAl endAl gyrA96 thi-1 relAl) was 
transformed with recombinant plasmids by 
electroporation (Dower et al. (1988) Nuc. Acids Res. , 
1*5:6127-6145). Plasmid DNA was prepared using the 

10 alkaline lysis procedure (Bernboim and Doly (1978) 

Nuc. Acids Res. , 7:1513-1523). The plasmid, pSVOA/L 
containing the entire cDNA encoding firefly luciferase 
(de Wet et al (1987) Mol. Cell. Biol. 7:725-737), was 
a gift from D.R. Helinski, University of California, 

15 San Diego, La Jolla, CA. 

Construction of recom binant BAV3 

MDBK cells transformed with a plasmid 
containing BAV3 El sequences were cotransf ected with 

20 the wt BAV3 DNA digested with Pvul and the plasmid, 
pSM51-Luc (Figs. 9 and 10) using the lipofection- 
mediated cotransf ection protocol (GIBCO/BRL, Life 
Technologies, Inc. , Grand Island, NY). The virus 
plagues produced following cotransf ection were 

25 isolated, plaque purified and the presence of the 
luciferase gene in the BAV3 genome was detected by 
agarose gel electrophoresis of recombinant virus DNA 
digested with appropriate restriction enzymes. 

30 Southern blot and hybridization 

Mock or virus-infected MDBK cells were 
harvested in lysis buffer (500 Mg/ml pronase in 0.01 M 
Tris, pH 7.4, 0.01 M EDTA, 0.5% SDS) and DNA was 
extracted (Graham et al (1991) Manipulation of 

35 adenovirus vectors In: Methods and Molecular Biology, 
7: Gene Transfer and Expression Techniques (Eds. Murray 
and Walker) Humana Press, Clifton, N.J. pp. 109-128) . 
100 ng DNA was digested either with BamHI, EcoRI or 
Xbal and resolved on a 1% agarose gel by 
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electrophoresis. DNA bands from the agarose gel were 
transferred to a GeneScreenPlus™ membrane (Du Pont 
Canada Inc. (NEN Products), Lachine, Quebec, Canada) 
by the capillary blot procedure (Southern, E.M. (1975) 
5 J. Mol. Biol- 98:503-517). Probes were labeled with 
32 P using an Oligolabeling Kit (Pharmacia LKB 
Biotechnology (Canada) Ltd., Dorval, Quebec, Canada) 
and the unincorporated label was removed by passing 
the labeled probe through a sephadex G-50 column 
10 (Sambrook et al (1989) supra). Probes were kept in a 
boiling water bath for 2 min and used in hybridization 
experiments following GeneScreenPlus™ hybridization 
protocol. The DNA bands which hybridized with the 
probe were visualized by autoradiography. 

15 

Luciferase assays 

The protocol was essentially the same as 
described (Mittal et al (1993) Virus Res. 28;67-90) . 
Briefly, MDBK cell monolayers in 25 mm multi-well 

20 dishes (Corning Glass Works, Corning, NY) were 

infected in duplicate either with BAV3-Luc (3.1) or 
BAV3-Luc (3.2) at a m.o.i. of 50 p.f.u. per cell. At 
indicated time points post-infection, recombinant 
virus-infected cell monolayers were washed once with 

25 PBS (0.137 M NaCl, 2.7 mM KC1, 8 mM Na 2 HP0 4 , 1.5 mM 
KH 2 P0 4 ) and harvested in 1 ml luciferase extraction 
buffer (100 mM potassium phosphate, pH 7.8, 1 mM 
dithiothreitol) . The cell pellets were resuspended in 
200 ixl of luciferase extraction buffer and lysed by 

30 three cycles of freezing and thawing. The 

supernatant s were assayed for luciferase activity. 
For the luciferase assay, 20 ftl of undiluted or 
serially diluted cell extract was mixed with 350 jzl of 
luciferase assay buffer (25 mM glycylglycine, pH 7.8, 

35 15 mM MgCl 2 , 5 mM ATP) in a 3.5 ml tube (Sarstedt Inc., 
St-Laurent, Quebec, Canada) . Up to 48 tubes can be 
kept in the luminometer rack and the equipment was 
programed to inject 100 pi of luciferin solution (1 mM 
luciferin in 100 mM potassium phosphate buffer, pH 
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7.8) in the tube present in the luminometer chamber to 
start the enzyme reaction. The Luminometer (Packard 
Picolite Luminometer, Packard Instrument Canada, Ltd., 
Mississauga, Ontario, Canada) used in the present 
5 study produced 300 to 450 light units of background 

count in a 10 sec reaction time. Known amounts of the 
purified firefly lucif erase were used in lucif erase 
assays to calculate the amount of active luciferase 
present in each sample. 

10 

Western blotting 

Mock or virus-infected MDBK cells were lysed 
in 1:2 diluted 2X loading buffer (80 mM Tris-HCl, pH 
6.8, 0.67 M urea, 25% glycerol, 2.5% SDS, 1 M 

15 mercaptoethanol , 0.001% bromophenol blue), boiled for 
3 min and then centrifuged to pellet cell debris. 
Proteins were separated by SDS-polyacrylamide gel 
electrophoresis (SDS-PAGE) on 0.1% SDS-10% 
polyacrylamide gels (Laemmli, et al (1970) Nature 

20 227:680-685). After the end of the run, polypeptide 
bands in the gel were electrophoretically transferred 
to a nitrocellulose membrane (Bio-Rad Laboratories, 
Richmond, CA) . The membrane was incubated at room 
temperature for 2 h with 1:4000 diluted rabbit anti- 

25 luciferase antibody (Mittal et al (1993) supra) . The 
binding of anti-lucif erase antibody to the specific 
protein band/s on the membrane was detected with 
1:5000 diluted horseradish peroxidase conjugated-goat 
anti-rabbit IgG (Bio-Rad Laboratories, Richmond, CA) 

30 and with an ECL Western blotting detection system 
(Amersham Canada Ltd., Oakville, Ontario). 

Example 1 Cloning of BAV3 El Region DNA for sequencing 
To complement the restriction site (Kurokawa 
35 et al, 1978 J. Virol. . 28:212-218; Hu et al, 1984 J. 
Virol . 49:604-608) other restriction enzyme sites in 
the BAV3 genome were defined. The 8.4 kilobase pair 
(kb) Sail B fragment which extends from the left end 
of the genome to approximately 24% was cloned into the 
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Smal-Sall sites of pUC18 essentially as described 
previously (Graham et al, 1989 EMBO Journal 8:2077- 
2085). Beginning at the left end of the BAV3 genome, 
the relevant restriction sites used for subsequent 
5 subcloning and their approximate positions are: SacI 
(2%), EcoRI (3.5%), Hindlll (5%), SacI (5.5%), Smal 
(5.6%) and Hindlll (11%). Through the use of 
appropriate restriction enzymes, the original plasmid 
was collapsed to contain smaller inserts which could 

10 be sequenced using the pUC universal primers. Some 

fragments were also subcloned in both pUC18 and pUC19 
to allow conf irmational sequencing in both directions. 
These procedures, together with the use of twelve 
different oligonucleotide primers hybridizing with 

15 BAV3 sequences, allowed to sequence the BAV3 genome 
from its left end to the Hindlll site at 11%. 

To ensure that some features of the sequence 
obtained were not unique to the initial clone selected 
for sequencing, two more pUCl9 clones were prepared 

20 containing the Sail fragment from a completely 

independent DNA preparation. These clones were used 
to confirm the original sequence for the region from 
approximately 3% to 5.5% of the BAV3 genome. 

DNA sequencing reactions were based on the 

25 chain-termination method (Sanger et al. 1977 PNAS . USA 
74:5463-5467) and manual sequencing followed the DNA 
sequencing protocol described in the Sequenase™ kit 
produced by US Biochemical. [oe- 35 S]dATPs was obtained 
from Amersham Canada Ltd. All oligonucleotides used 

30 as primers were synthesized by the Central Facility of 
the Molecular Biology and Biotechnology Institute 
(MOBIX) at McMaster University, Hamilton, Ontario. 
The entire region (0 to 11%) of the BAV3 genome was 
sequenced by at least two independent determinations 

35 for each position by automated sequencing on a 373A 
DNA Sequencer (Applied Biosystems) using Taq-Dye 
terminators. Over half of the region was further 
sequenced by manual procedures to confirm overlaps and 
other regions of interest. 
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DNA sequence analysis and protein 

comparisons were carried out on a MICROGENIE program* 

Example 2 Coding Sequences of the BAV3 El Region 
5 BAV3 genomic DNA, from the left end of the 

genome to the Hindlll site at approximately 11%, was 
cloned into plasmids and sequenced by a combination of 
manual and automated sequencing. An examination of 
the resultant BAV3 El genomic sequence (Fig 1) 

10 revealed a number of interesting features relevant 
both to transactivation and to other functions 
associated with adenovirus El proteins. On the basis 
of open reading frames (ORFs) it was possible to 
assign potential coding regions analogous to those 

15 defined in human Ad5 (HAd5) . As shown in Fig 1, ORFs 
corresponding roughly to the first exon and unique 
region of HAd5 E1A as well are ORFs corresponding to 
the 19k and 58k proteins of E1B and the ORF 
corresponding to protein IX were all defined in this 

20 sequence. The open reading frame defining the 

probable El A coding region begins at the ATG at nt 606 
and continues to a probable splice donor site at 
position 1215. The first consensus splice acceptor 
site after this is located after nt 1322 and defines 

25 an intron of 107 base pairs with an internal consensus 
splice branching site at position 1292. The putative 
BAV3 E1A polypeptide encoded by a message 
corresponding to these splice sites would have 211 
amino acids and a unmodified molecular weight of 

30 23,323. The major homology of the protein encoded by 
this ORF and HAd5 E1A is in the residues corresponding 
to CR3 (shown in Fig 2). The homology of amino acid 
sequences on both sides of the putative intron 
strengthens the assignment of probable splice donor 

35 and acceptor sites. The CR3 has been shown to be of 
prime importance in the transactivation activity of 
HAd5 EIA gene products. As seen in Fig. 2 A the 
homology of this sequence in the BAV3 protein to the 
corresponding region of the 289R EIA protein of HAd5 
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includes complete conservation of the 
CysX 2 CysX n CysX 2 Cys sequence motif which defines the 
metal binding site of this protein (Berg, 1986 Science 
232 :485-487) as well as conservation of a number of 
5 amino acids within this region and within the promoter 
binding region as defined by Lillie and Green 1989 
Nature 338 :39-44) . 

The only other region of significant 
homology between the BAV3 E1A protein and that of HAd5 

10 was a stretch of amino acids known to be important in 
binding of the cellular Rb protein to the HAd5 E1A 
protein (Dyson et al, 1990 J. Virol, 64:1353-1356). 
As shown in Fig 2B, this sequence, which is located 
between amino acids 120 and 132 in the CR2 region of 

15 HAd5 E1A, is found near the amino (N-) terminus of the 
BAV3 protein between amino acids 2 6 and 37, 

An open reading frame from the ATG at nt 
147 6 to the termination signal at 1947 defines a 
protein of 157 amino acids with two regions of major 

20 homology to the HAd5 E1B 19k protein. As shown in Fig 
3 both the BAV3 and the HAd5 proteins have a centrally 
located hydrophobic amino acid sequence. The sequence 
in BAV3 , with substitutions of valine for alanine and 
leucine for valine, should result in a somewhat more 

25 hydrophobic pocket than the corresponding HAd5 region. 
The other portion of HAd5 19k that may be conserved in 
the BAV3 protein is the serine rich sequence found 
near the N-terminus (residues 20 to 26) in HAd5 19k 
and near the C-terminus (residues 136 to 142) in the 

30 BAV3 protein (also shown in Fig 3). 

On ORF beginning at the ATG at nt 1850 and 
terminating at nt 3110 overlaps the preceding BAV3 
protein reading frame and thus has the same 
relationship to it as does the HAd5 E1B 56k protein to 

35 E1B 19k protein. As shown in Fig 4 this BAV3 protein 
of 420R and the corresponding HAd5 E1B 56k protein of 
496R show considerable sequence homology over their C- 
terminal 346 residues- The N-terminal regions of 
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these proteins (not depicted in the figure) show no 
significant homology and differ in overall length. 

Following the E1B ORFs, the open reading 
frame beginning at nt 3200 and ending at the 
5 translation terminator TAA at nt 3575 defines a 

protein of 12 5R with an unmodified molecular weight of 
13,706. As seen in Fig 5 this protein shares some 
homology with the structural protein IX of HAd5 
particularly in N- terminal sequences. 

10 

Possible Transcription Control Regions in BAV3 El 

The inverted terminal repeats (ITR) at the 
ends of the BAV3 genome have been shown to extend to 
195 nt (Shinagawa et al, 1987 Gene 55:85-93)- The GC- 

15 rich 3' portion of the ITR contains a number of 
consensus binding sites for the transcription 
stimulating protein SP1 (Dynan and Tijan (1983) Cell 
35:79-87) and possible consensus sites for the 
adenovirus transcription factor (ATF) (Lee et al. 

20 (1987) Nature 325 :368-372) occur at nts 60 and 220. 
While there are no exact consensus sites for the 
factors EF-1A (Bruder and Healing (1989) Mol. Cell 
Biol. 9:5143-5153) or E2F (Kovesdi et al, 1987 PNAS, 
USA 84:2180-2184) upstream of the ATG at nt 606, there 

25 are numerous degenerate sequences which may define the 
enhancer region comparable to that seen in HAd5 
(Hearing and Shenk, 1986 Cell 45:229-236). 

The proposed BAV3 E1A coding sequence 
terminates at a TGA residue at nt 1346 which is 

30 located within a 35 base pair sequence which is 
immediately directly repeated (see Fig 1) . Two 
repeats of this sequence were detected in three 
independently derived clones for a plaque purified 
stock of BAV3. The number of direct repeats can vary 

35 in any BAV3 population though plaque purification 
allows for isolation of a relatively homogeneous 
population of viruses. That direct repeats in the 
sequences can function as promoter or enhancer 
elements for E1B transcription is being tested. There 



WO 95/16048 



PCT/CA94/00678 



are no strong polyA addition consensus sites between 
the E1A and the E1B coding sequences and in fact no 
AATAA sequence is found until after the protein IX 
coding sequences following E1B. The TATAAA sequence 
5 beginning at nt 1453 could function as the proximal 
promoter for E1B but it is located closer to the ATG 
at 1476 than is considered usual (McKnight et al, 1982 
Science 217:316-322). The TATA sequence located 
further upstream immediately before the proposed El A 

10 intron sequence also seems inappropriately positioned 
to serve as a transcription box for the E1B proteins . 
There are clearly some unique features in this region 
of the BAV3 genome. 

The transcriptional control elements for the 

15 protein IX transcription unit are conventional and 

well defined. Almost immediately following the open 
reading frame for the larger E1B protein there is, at 
nt 3117, a SP1 binding sequence. This is followed at 
3135 by a TATAAAT sequence which could promote a 

20 transcript for the protein IX open reading frame 

beginning at the ATG at 3200 and ending with the TAA 
at 3575. One polyA addition sequence begins within 
the translation termination codon and four other AATAA 
sequences are located at nts 3612, 3664, 3796 and 

25 3932. 

In keeping with the general organization of 
the E1A region of other adenoviruses, the BAV3 E1A 
region contains an intron sequence with translation 
termination codons in all three reading frames and 

30 which is therefore probably deleted by splicing from 
all E1A mRNA transcripts. The largest possible 
protein produced from the BAV3 E1A region will have 
211 amino acid residues and is the equivalent of the 
289 amino acid protein translated from the 13s mRNA of 

35 HAd5 . Two striking features in a comparison of these 
proteins are the high degree of homology in a region 
corresponding to CR3 and the absence in BAV3 of most 
of amino acids corresponding to the second exon of 
HAd5. In fact the only amino acids encoded in the 
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acidic residues in the 18 residues amino to the metal 
binding domain. 

The other interesting feature of BAV3 El A, 
which is undoubtedly relevant to the oncogenic 
5 potential of this virus, is the presence of the 

sequence Asp27-Leu-Glu-Cys-His-Glu which conforms to, 
a core sequence known to be important in the binding 
of cellular Rb and related proteins by the 
transforming proteins of a number of DNA tumour 

10 viruses (Dyson et al, 1990 supra) . From deletion 

mutant analysis there is a clear association between 
the potential of HAd5 E1A proteins to bind Rb and the 
ability of the protein to induce morphological 
transformation in appropriate cells (see references in 

15 Dyson et al, 1990 supra). The BAV3 E1A protein is 
distinct from its HAd5 counterpart in the relative 
position of this Rb binding sequence which is in the 
CR2 of HAd5 E1A and near the N-terminus of the BAV3 
E1A protein. 

20 Through the use of alternative splice sites 

HAd5 E1A transcripts can give rise to at least 5 
distinct mRNA species (Berk et al, 1978 Cell 14:695- 
711; Stephens et al, 1987 EMBO Journal 6:2027-2035). 
Whether BAV3, like HAd5, can generate a number of 

25 different mRNA species through the use of alternative 
splice sites in the E1A transcripts remains to be 
determined. For example a potential splice donor site 
which could delete the sequence equivalent to the 
unique sequence of HAd5 is present immediately after 

30 nt 1080 but it is not known if this site is actually 
used. 

HAd5 E1B encodes two proteins (19k and 56k) 
either of which can cooperate with E1A, by pathways 
which are additive and therefore presumably 
35 independent (McLorie et al, 1991 J. Gen. Virol. 

72:1467-1471), to produce morphological transformation 
of cells in culture (see for example: Branton et al, 
1985 supra; Graham, 1984 supra). The significance of 
the conservation of the hydrophobic stretch of amino 
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acids in the central portion of the shorter E1B 
proteins of HAd5 and BAV3 is not clear as yet. A 
second short region of homology Gln-Ser-Ser-X-Ser-Thr- 
Ser at residue 13 6 near the C-terminus of the BAV3 
5 protein is located near the N-terminus at residue 20 
in the HAd5 19k protein- The major difference in both 
length and sequence of the larger (42 OR) E1B protein 
of BAV3 from the corresponding HAd5 protein (496R) is 
confined to the N-terminus of these proteins. The two 

10 proteins show considerable evolutionary homology in 

the 345 amino acids that extend to their C-termini. A 
similar degree of homology extends into the N-terminal 
halves of protein IX of BAV3 and HAd5. Taken together 
these analyses suggest that while BAV3 and the human 

15 adenoviruses have diverged by simple point mutational 
events in some regions, more dramatic genetic events 
such as deletion and recombination may have been 
operating in other regions particularly those defining 
the junction between E1A and E1B. 

20 

Example 3 Cloning and sequencing of the BAV3 E3 and 
fibre genes 

The general organization of adenovirus 
genomes seems to be relatively well conserved so it 

25 was possible to predict, from the locations of a 
number of HAd E3 regions, that BAV E3 should lie 
between map units (m.u.) 77 to 86. To prepare DNA for 
cloning and sequencing, BAV3 (strain WBR-l) was grown 
in Madin-Darby bovine kidney (MDBK) cells, virions 

30 were purified and DNA was extracted (Graham, F.L. & 
Prevec, L. (1991) Methods in Molecular Biology, vol* 
7, Gene Transfer and Expression Protocols, pp. 109- 
146. Edited by E.J, Murray, Clifton, New Jersey; 
Humana Press.). Previously published restriction maps 

35 for EcoRI and JSamHI (Kurokawa et al., 1978) were 

confirmed (Fig. 6) . The BamHI D and EcoRT F fragments 
of BAV3 DNA were isolated and inserted into pUC18 and 
pUC19 vectors, and nested sets of deletions were made 
using exonuclease III and SI nuclease (Henikoff f S. 
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(1984) Gene , 28:351-359). The resulting clones were 
sequenced by the dideoxynucleotide chain termination 
technique (Sanger, F., Nicklen, S. & Coulson, A.R. 
(1977) Proceedings of the National Academy of 
5 Sciences. U.S.A. . 74:5463-5467) . The nucleotide 

sequence from positions 1 to 287 was obtained from the 
right end of the BaroHI B fragment (Fig. 6) . The 
sequence of the regions spanning (i) the BamHI site at 
nucleotide 3306 and the ^coRI site at nucleotide 3406, 

10 and (ii) the EcoRI site at nucleotide 4801 and the 

nucleotide 5100 was obtained from a plasmid containing 
the XJbal C fragment (m.u. 83 to 100; not shown) using 
primers hybriding to BAV3 sequences. Analysis of the 
sequence was performed with the aid of the PC/GENE 

15 sequence analysis package developed by Amos Bairoch, 
Department of Medical Biochemistry, University of 
Geneva , Switzerland . 

The 5100 nucleotide sequence which extends 
between 77 and 92 m.u. of the BAV3 genome is shown in 

20 Fig. 7. The upper strand contains 14 open reading 
frames (ORFs) which could encode polypeptides of 60 
amino acid residues or more (Fig. 6 and 7) . The lower 
strand contains no ORF encoding a protein of longer 
than 50 amino acids after an initiation codon. The 

25 predicted amino acid sequence for each ORF on the 

upper strand was analyzed for homology with predicted 
amino acid sequences from several sequenced Ads: HAd2 
(H§risse, J., Courtois, G. & Galibert, F. (1980) 
Nucleic Acids Research , 8:2173-2192; Herisse, J., 

30 Courtois, G. & Galibert, F. (1981) Nucleic Acids 

Research . 9:1229-1249), -3(Signas, C. , Akusjarvi, G. & 
Pettersson, U. (1985) Journal of Virology . 53:672- 
678.), -5(Cladaras f C. & Wold, W.S.M. (1985) Virology . 
140:28-43), -7 (Hong, J.S., Mullis, K.G. & Engler, 

35 J. A. (1988) Virology . 167:545-553) and -35 (Flomenberg, 
P.R., Chen, M. & Horwitz, M.S. (1988) Journal of 
Virology , 62 : 4431-4437) , and murine Adl (MAdl) 
(Raviprakash, K.S., Grunhaus, A., El Kholy, M.A. & 
Horwitz, M.S. (1989) Journal of Virology r 63:5455- 
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5458) and canine Adl (CAdl) (Dragulev, B.P., Sira, S., 
Abouhaidar, M.G. & Campbell, J.B. (1991) Virology, 
183 :298-305) . Three of the BAV3 ORFs exhibited 
homology with characterized HAd proteins pVIII, fibre 
5 and the 14. 7K E3 protein. The amino acid sequence 
predicted from BAV3 ORF 1 shows overall identity of 
approximately 55% when compared to the c-terminal 75% 
of HAd2 pVIII (Cladaras & Wold, 1985, supra) (Fig. 
8a) , indicating that ORF 1 encodes the right end of 

10 BAd3 pVIII. Near the C-terminal end of BAd3 pVIII 

there is a 67 amino acid stretch (residues 59 to 125; 
Fig. 8a) which has 75% identity with HAd2 pVIII. This 
region has previously been shown to be highly 
conserved among different Ads (Cladaras & Wold, 1985, 

15 supra; Signas, C, Akusjarvi, G. & Pettersson, U. 

(1986) Gene , 50:173-184,; Raviprakash et al., 1989, 
supra; Dragulev et al., 1991, supra). 

The fibre protein is present on the surface 
of the virion as long projections from each vertex of 

20 the icosahedral capsid and is involved in a number of 
Ad functions including attachment of the virus to the 
cell surface during infection, assembly of virions and 
antigenicity (Philipson, L. (1983) Current Topics in 
Microbiology and Immunology , 109 :1-52) . On the basis 

25 of the primary structure of HAd2 fibre protein, it has 
been proposed that the shaft region (between amino 
acid residues 40 and 400) is composed of a number of 
repeating structural motifs containing about 15 
hydrophobic residues organized in two short 0-sheet s 

30 and two 0-bends (Green, N.M., Wrigley, N.G., Russell, 
W.C., Martin, S.R. & McLachlan, A.D. (1983) EMBO 
Journal , 2:1357-1365). The amino acid sequences at 
the N terminus of the BAV3 ORF 6-encoded protein share 
about 60% identity with the HAd 2 fibre protein tail, 

35 but there is little or no similarity in the knob 

region, and about 45% identity overall (Fig. 8c). The 
BAd3 fibre gene would encode a protein of 976 residues 
if no splicing occurs, i.e. 394 amino acid residues 
longer than the HAd2 fibre protein. The number of 
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repeating motifs in the shaft region of the fibre 
protein from different Ads varies between 28 and 23 
(Signas et al., 1985, supra; Chroboczek, J. & Jacrot, 
B. (1987) Virology , 161 :549-554; Hong et al., 1988, 
5 supra; Raviprakash et al., 1989, supra; Dragulev et 
al., 1991, supra). The BAV3 fibre protein can be 
organized into 52 such repeats in this region (not 
shown) , which would account for most of the difference 
in size compared to those of HAd2, HAd3, HAD5, HAd7, 

10 CAdl and MAdl (Signas et al., 1985, supra; Herisse et 

al., 1980, supra; HerissS & Galibert, 1981, supra; Hong 
et al., 1988,supra; Raviprakash et al, , 1989, supra; 
Dragulev et al., 1991, supra). 

HAd2 and HAd5 E3 lies between the pVIII and 

15 the fibre genes an encodes at least 10 polypeptides 

(Cladaras & Wold, 1985, supra). The promoter for E3 of 
these two serotypes lies within the sequences encoding 
pVIII, about 320 bp 5' of the termination codon. No 
consensus TATA box is found in the corresponding 

20 region of the BAV3 sequences. A non-canonical 

polyadenylation signal (ATAAA) for E3 transcripts is 
located at position 1723, between the end of the 
putative E3 region and the beginning of ORF 6, 
encoding the fibre protein, and two consensus signals 

25 are located within ORF 6 at positions 2575 and 3565. 
The polyadenylation signal for the fibre protein is 
located at nucleotide 4877. Six ORFs were identified 
in the BAV3 genome between the pVIII and the fibre 
genes, but only four (ORFs 2, 3, 4 and 5) have the 

30 potential to encode polypeptides of at least 50 amino 
acids after an initiation codon (Fig. 7). The amino 
acid sequence predicted to be encoded by ORF 2 is 307 
residues long and contains eight potential N- 
glycosylation sites (Fig. 7) as well as a hydrophobic 

35 sequence which may be a potential transmembrane domain 
(PLLFAFVLCTGCAVLLTAFGPSILSGT) between residues 262 and 
289. This domain may be a part of the protein 
homologous to the HAd2 and HAd5 19K E3 glycoprotein 
(Cladaras & Wold, 1985, supra), and the proposed CAdl 
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22. 2K protein (Dragulev et al., 1991, supra), but ORF 
2 does not show appreciable homology with these 
proteins. The ORF 4 shows approximately 44% identity 
with the 14.7K E3 protein of HAd5 (Fig. 6 and 8b), 
5 which has been shown to prevent lysis of virus- 
infected mouse cells by tumour necrosis factor 
(Gooding, L.R. , Elmore, L.W. , Tollefson, A.E., Brody, 
H.A. & Wold, W.S.M. (1988) Cell , 53:341-346; Wold, 
W.S.M. & Gooding, L.R. (1989) Molecular Biology and 

10 Medicine , 6:433-452). Analysis of the 14. 7K protein 
sequence from HAd2, -3, -5 and -7 has revealed a 
highly conserved domain, which in HAd5 lies between 
amino acid residues 41 and 56 (Horton, T.M. , 
Tollefson, A.E., Wold, W.S.M. & Gooding, L.R. (1990) 

15 Journal of Virolocrv , 64:1250-1255). The corresponding 
region in the BAV3 ORF 4 -encoded protein, between 
amino acids 70 and 85, contains 11 amino acids 
identical to those of the HAd5 14. 7K protein conserved 
domain (Fig. 8b). 

20 The BAV3 E3 region appears to be 

approximately l.Skbp long, about half the size of 
those of HAd2 and -5 (Cladaras & Wold, 1985, supra), 
and novel splicing events in BAV3 E3 would be required 
to generate more homologues to the HAd3 E3 proteins. 

25 A similarly short E3 region has been reported for MAdl 
(RAviprakash et al., 1989, supra) and CAdl (Dragulev 
et al., 1991, supra). 



Exam ple 4 Construction of BAV3-lucif erase 

30 recombinants 

Adenovirus-based mammalian cell expression 
vectors have gained tremendous importance in the last 
few years as a vehicle for recombinant vaccine 
delivery, and also in gene therapy. BAV3-based 

35 expression vectors have a greater potential for 

developing novel recombinant vaccines for veterinary 
use. To show that BAV3 E3 gene products are not 
essential for virus growth in cultured cells and this 
locus could be used to insert foreign DNA sequences, a 
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1.7 kb fragment containing the firefly lucif erase gene 
was introduced in the 696 bp deletion of the E3 region 
of the BAV3 genome in the E3 parallel orientation to 
generate a BAV3 recombinant. 
5 The rationale of using the luciferase gene 

is that it acted as a highly sensitive reporter gene 
when introduced in the E3 region of the HAd5 genome to 
generate HAd5-Luc recombinants (Mittal et al (1993) 
Virus Res. 28:67-90) . 

10 To facilitate the insertion of the firefly 

luciferase gene into the E3 region of the BAV3 
genome, a BAV3 E3 transfer vector containing the 
luciferase gene was constructed (Fig. 9) ♦ The BAV3 E3 
region falls approximately between m.u. 77 and 82. In 

15 our first series of vectors we replaced a 696 bp Xhol- 
Ncol E3 deletion (between m.u. 78,8 and 80.8) with a 
Nrul-Sall cloning sites for insertion of foreign genes 
to obtain pSM14del2. A 1716 bp Bsml-Sspl fragment 
containing the luciferase gene was isolated and first 

20 inserted into an intermediate plasmid, pSM41, in the 
E3 locus at the Sail site by blunt end ligation to 
generate pSM41-Luc. The luciferase gene without any 
exogenous regulatory sequences, was inserted into the 
E3 locus in the same orientation as the E3 

25 transcription unit. The kan r gene was inserted into 
pSM41-Luc at the Xbal site present within the 
luciferase gene to generate an amp r /kan r plasmid, 
pSM41-Luc-Kan. A 7.7 kb fragment containing the BAV3 
sequences along with the luciferase gene and the kan r 

30 gene was obtained from pSM41-Luc-Kan by digestion with 
BamHI and inserted into an amp r plasmid, pSM5l 
partially digested with BamHI to replace a 3.0 kb 
BamHI fragment (lies between m.u. 77.8 and 86.4) to 
generate a doubly resistant (kan r & amp r ) plasmid, 

35 pSM51-Luc-Kan. The kan r gene was deleted from pSM51- 
Luc-Kan by partial cleavage with Xbal to generate 
pSM51-Luc containing the luciferase gene in the E3- 
parallel orientation. 
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MDBK cells transformed with a plasmid 
containing the BAV3 El sequences was cotransf ected 
with the wt BAV3 DNA digested with Pvul, which make 
two cuts within the BAV3 genome at m.u 65.7 and 71.1, 
5 and the plasmid, pSM51-Luc to rescue the luciferase 

gene in E3 of the BAV3 genome by in vivo recombination 
(Fig. 10) . The digestion of the wt BAV3 DNA with Pvul 
was helpful in minimizing the generation of the wt 
virus plaques following cotransf ection. The left end 

10 of the wt BAV3 genome represented by Pvul 'A' fragment 
falls between m.u. 0 and 65.7, and pSM51-Luc which 
extends between m.u. 31.5 and 100 (except for E3 
deletion replaced with the luciferase gene) have 
sufficient overlapping BAV3 DNA sequences to generate 

15 recombinant viruses. 

Two virus plaques were obtained in two 
independent cotransf ection experiments which were 
grown in MDBK cells. The viral DNA from both plaques 
was extracted and analyzed by agarose gel 

20 electrophoresis after digesting either with BamHI, 

EcoRI or Xbal to identify the presence and orientation 
of the luciferase gene in the viral genome (data not 
shown). In the genomes of both recombinants, the 
luciferase gene was present in the E3 region in the E3 

25 parallel orientation. The BAV3 -luciferase 

recombinants were plaque purified and named BAV3-Luc 
(3,1) and BAV3-Luc (3.2) to represent plaques obtained 
from two independent experiments. Since both 
recombinant virus isolates were identical they will be 

30 referred to as BAV3-Luc. The presence of the 

luciferase gene in BAV3-Luc isolates are further 
confirmed by Southern blot analyses and luciferase 
assays using extracts from recombinant virus-infected 
cells. 

35 

Characterization of BAV3 -recombinants 

Southern blot analyses of the wt BAV3 and 
recombinants genomic DNA digested either with BamHI, 
EcoRI or Xbal, were carried out to confirm the 
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of luciferase activity was first observed at 12 h 
post-infection reaching a peak at 30 h post-infection 
and then dropped subsequently. At 30 h post- 
infection, approximately 425 pg luciferase was 
detected in 4xl0 5 BAV3-Luc (3 . 1) -infected MDBK cells. 
In MDBK cells-infected with the wt BAV3, luciferase 
expression was not detected (data not shown) . The 
kinetics of luciferase expression by BAV3-Luc (3.1) 
and BAV3— Luc (3.2) appears very much similar. The 
kinetics of luciferase expression also showed that the 
majority of enzyme expression in virus-infected cells 
seemed to occur late in infection. To determine 
luciferase expression in the absence of viral DNA 
replication, BAV3 -Luc-infected MDBK cells were 
incubated in the presence of an inhibitor of DNA 
synthesis, l-j£?-D-arabinofuranosyl cytosine (AraC) and 
luciferase activity was measured in virus-infected 
cell extracts at various times post-infection and 
compared to luciferase expression obtained in the 
absence of AraC (Fig. 14) . When the recombinant 
virus-infected cells were incubated in the presence of 
AraC, luciferase expression at 18, 24 and 30 h post- 
infection was approximately 20-30% of the value 
obtained in the absence of AraC. These results 
indicated that the majority of luciferase expression 
in MDBK cells infected with BAV3-Luc took place after 
the onset of viral DNA synthesis. To confirm this 
MDBK cells-infected with the BAV3-Luc were grown in 
the absence or presence of AraC, harvested at 18 h, 24 
h, and 30 h post-infection, viral DNA extracted and 
analyzed by dot bot analysis using pSM51-Luc (see Fig. 
9) as a probe (data not shown) . In the presence of 
AraC, viral DNA synthesis was severely reduced 
compared to viral DNA synthesis in the absence of 
AraC. 



Western blot analysis of BAV3-Luc-inf ected cells 



Luciferase was expressed as an active enzyme 
as determined by luciferase assays using extracts from 
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MDBK cells-infected with BAV3-Luc (see Fig, 13). The 
luciferase gene without any exogenous regulatory 
sequences was inserted into E3 of the BAV3 genome, 
therefore, there was a possibility of luciferase 
5 expression as a fusion protein with part of an E3 

protein if the luciferase gene was in the same frame. 
Such as, Fl and F3 which represent open reading frames 
(ORFs) for E3 proteins (Fig- 15) or the fusion protein 
may arise due to recognition of an upstream initiation 

10 codon in the luciferase ORF. To explore this 

possibility we sequenced the DNA at the junction of 
the luciferase gene and the BAV3 sequences with the 
help of a plasmid, pSM51-Luc and a synthetic primer 
design to bind luciferase coding sequences near the 

15 initiation codon (data not shown) . The luciferase 
coding region fell in frame F2. The luciferase 
initiation codon was the first start codon in this 
frame, however, the ORF started at 84 nucleotides 
upstream of the luciferase start codon. To further 

20 confirm that luciferase protein is of the same 
molecular weight as purified firefly luciferase, 
unlabeled mock-infected, wt BAV3-inf ected or BAV3-Luc- 
infected MDBK cell extracts were reacted with an anti- 
lucif erase antibody in a Western blot (Fig. 16) . A 62 

25 kDa polypeptide band was visible in the BAV3-Luc (lane 
3 and 4) -infected cell extracts which were of the same 
molecular weight as pure firefly luciferase (lane 5). 
We are not sure whether a band of approximately 30 kDa 
which also reacted with the anti-lucif erase antibody 

30 in lanes 3 and 4 represented a degraded luciferase 
protein. 

The majority of luciferase expression is 
probably driven from the major late promoter (MLP) to 
provide expression paralleling viral late gene 
35 expression, moreover, the enzyme expression seen in 
the presence of AraC may be taking place from the E3 
promoter. In HAd5 vectors, foreign genes without any 
exogenous regulatory sequences when inserted in E3 
also displayed late kinetics and were inhibited by 
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Arac. The BAV3 recombinant virus replicated 
relatively well in cultured cells but not as good as 
the wt BAV3 . This is not surprising as infectious 
virus titers of a number of HAd5 recombinants were 
5 slightly lower than the wt HAd5 (Bett et al (1993) J. 
Virol. 67:5911-5921). This may be because of reduced 
expression of fiber protein in recombinant 
adenoviruses having inserts in the E3 region compared 
to the wt virus (Bett et al,. supra and Mittal et al 

10 (1993) Virus Res. 28:67-90). 

The E3 of BAV3 is approximately half the 
size of the E3 region of HAd2 or HAd5 and thus has the 
coding potential for only half the number of proteins 
compared to E3 of HAd2 or HAd5 (Cladaras et al (1985) 

15 Virology 140:28-43: Herisse et al (1980) Nuc. Acids 

Res. 8:2173-2192; Herisse et al (1981) Nuc. Acids Res. 
9:1229-1249 and Mittal et al (1993 J. Gen. Virol. 
73.: 3295-3000) . BAV3 E3 gene products have been shown 
to be not required for virus growth in tissue culture. 

20 However, presently it is known that BAV3 E3 gene 

products also evade immune surveillance in vivo like 
HAds E3 proteins. One of the BAV3 E3 open reading 
frames (ORFs) has been shown to have amino acid 
homology with the 14.7 kDa E3 protein of HAds (Mittal 

25 et al (1993) supra). The 14.7 kDa E3 protein of HAds 
prevents lysis of virus-infected mouse cells by tumour 
necrosis factor (Gooding et al (1988) Cell 53:341-346 
and Horton et al (1990) J. Virol. 64:1250-1255). The 
study of pathogenesis and immune responses of a series 

30 of BAV3 E3 deletion mutants in cattle provides very 
useful information regarding the role of E3 gene 
products in modulating immune responses in their 
natural host. 

The BAV3 -based vector has a 0.7 kb E3 

35 deletion which can hold an insert up to 2.5 kb in 

size. The BAV3 E3 deletion can extend probably up to 
1.4 kb which in turn would also increase the insertion 
capacity of this system. The role of the MLP and the 
E3 promoter is examined to determine their ability to 
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drive expression of a foreign gene inserted into E3 
when a proper polyadenylation signal is provided. 
Exogenous promoters, such as, the simian virus 40 
(SV40) promoter (Subramant et al (1983) Anal. Biochem. 
5 135:1-15), the human cytomegalovirus immediate early 
promoter (Boshart et al (1985) Cell 43 :215-222) , and 
the human beta-act in promoter (Gunning et al (1987) 
PNAS, USA 84:4831-4835) are tested to evaluate their 
ability to facilitate expression of foreign genes when 

10 introduced into E3 of the BAV3 genome. 

Recently HAd-based expression vectors are 
under close scrutiny for their potential use in human 
gene therapy (Ragot et al (1993) Nature 361 :647-650; 
Rosenfeld et al (1991) Science 252 :431-434; Rosenfeld 

15 et al (1992) Cell 68:141-155 and Stratf ord-Perricaudet 
et al (1990) Hum. Gene. Ther. 1:241-256). A 
preferable adenovirus vector for gene therapy would be 
one which maintains expression of the required gene 
for indefinite or for a long period in the target 

2 0 organ or tissue. It may be obtained if the 

recombinant virus vector genome is incorporate into 
the host genome or maintained its independent 
existence extrachromosomally without active virus 
replication. HAds replicate very well in human, being 

25 their natural host. HAds can be made defective in 
replication by deleting the El region, however, how 
such vectors would maintain the expression of the 
target gene in a required fashion is not very clear. 
Moreover, the presence of anti-HAds antibodies in 

30 almost every human being may create some problems with 
the HAd-based delivery system. The adenovirus genomes 
have a tendency to form circles in non-permissive 
cells. BAV-based vectors could provide a possible 
alternative to HAd-based vectors for human gene 

35 therapy. As BAV3 does not replicate in human, the 
recombinant BAV3 genomes may be maintained as 
independent circles in human cells providing 
expression of the essential protein for a long period 
of time. 
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The foreign gene insertion in animal 
adenoviruses is much more difficult than HAds because 
it is hard to develop a cell line which is also good 
for adenovirus DNA-mediated transfection. This may be 
5 one of the major reasons that the development of an 

animal adenovirus-based expression system has not been 
reported so far. It took us more than a year to 
isolate a cell line suitable for BAV3 DNA-mediated 
transf ection. However, the rapid implementation of 

10 BAV-based expression vectors for the production of 

live virus recombinant vaccines for farm animals, is 
very promising. BAVs grow in the respiratory and 
gastrointestinal tracts of cattle, therefore, 
recombinant BAV-based vaccines have use to provide a 

15 protective mucosal immune response, in addition to 
humoral and cellular immune responses, against 
pathogens where mucosal immunity plays a major role in 
protection. 

20 Example 5 Generation of cell lines transformed with 
the BAV3 El sequences 

MDBK cells in monolayer cultures were 
transfected with pSM71-neo, pSM61-kanl or pSM61-kan2 
by a lipofection-mediated transfection technique 

25 (GIBCO/BRL, Life Technologies, Inc. , Grand Island, 
NY) . At 48 h after transfection, cells were 
maintained in the MEM supplemented with 5% fetal 
bovine serum and 700 /ig/ml G418. The medium was 
changed every 3rd day. In the presence of G418, only 

30 those cells would grow which have stably incorporated 
the plasmid DNA used in transfection experiments into 
their genomes and are expressing the neo r gene. The 
cells which have incorporated the neo r gene might also 
have taken up the BAV3 El sequences and thus 

35 expressing BAV3 El protein/s. A number of neo r (i.e., 
G418-resistant) colonies were isolated, expended and 
tested for the presence of BAV3 El message/ s by 
Northern blot analyses using a DNA probe containing 
only the BAV3 El sequences. Expression of BAV3 El 
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protein/ s were confirmed by a complimentation assay 
using a HAd5 deletion mutant defective in El function 
due to an El deletion. 

Fetal bovine kidney cells in monolayers were 
5 also transfected with pSM71-neo, pSM61kan-l or pSM6l- 
kan2 by the lipof ection-mediated transfection 
technique, electroporation (Chu et al (1987) Nucl. 
Acids Res, 15:1311-1326), or calcium phosphate 
precipitation technique (Graham et al (1973) virology 
10 52:456-467). Similarly, a number of G418-resistant 
colonies were isolated, expended and tested for the 
presence of BAV3 El gene products as mentioned above. 

Example 6 Generation of a BAV3 recombinant containing 

15 the beta-galactosidase gene as an El insert 

As El gene products are essential for virus 
replication, adenovirus recombinants containing El 
inserts will grow only in a cell line which is 
transformed with the adenovirus El sequences and 

20 expresses El. A number of cell line which are 

transformed with the BAV3 El sequences were isolated 
as described earlier. The technique of foreign gene 
insertions into the El regions is similar to the gene 
insertion into the E3 region of the BAV3 genome, 

25 however, for insertion into El there is a need of an 
El transfer plasmid which contains DNA sequences from 
the left end of the BAV3 genome, an appropriate 
deletion and a cloning site for the insertion of 
foreign DNA sequences. G418-resistant MDBK cell 

30 monolayers were cotransfected with the wild-type (wt) 
BAV3 DNA and pSM71-Z following the lipofection- 
mediated transfection procedure (GIBCO/BRL, Life 
Technologies, Inc., Grand Island, NY). The monolayers 
were incubated at 37 °C under an agarose overlay. 

35 After a week post-incubation an another layer of 
overlay containing 300 ug/ml Blu-gal™ (GIBCO/BRL 
Canada, Burlington, Ontario, Canada) was put onto each 
monolayer. The blue plaques were isolated, plaque 
purified and the presence of the beta-galactosidase 
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gene in the BAV3 genome was identified by agarose gel 
electrophoresis of recombinant virus DNA digested with 
suitable restriction enzymes and confirmed by beta- 
galactosidase assays using extracts from recombinant - 
5 virus infected cells. 



Deposit of Biological Materials 
The following materials were deposited and 
are maintained with the Veterinary Infectious Disease 
10 Organization (VIDO) , Saskatoon, Saskatchewan, Canada. 

The nucleotide sequences of the deposited 
materials are incorporated by reference herein, as 
well as the sequences of the polypeptides encoded 
thereby. In the event of any discrepancy between a 
15 sequence expressly disclosed herein and a deposited 
sequence, the deposited sequence is controlling. 

Material Internal Accession No . Deposit 

Date 

Recombinant plaamldB 



20 



25 



30 



pSMSl pSM51 Dec 6, 1993 

pSM71 pSM71 Dec 6, 1993 

Recombinant cell lines 

MDBK cells transformed with BAV3 El sequences (MDBK-BAVE1) 

Dec 6, 1993 

Fetal bovine kidney cells transformed with BAV3 El sequences { FBK- 

BAV-E1) Dec 6, 1993 



While the present invention has been 
illustrated above by certain specific embodiments, the 
specific examples are not intended to limit the scope 
of the invention as described in the appended claims. 



35 
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SEQUENCE USTING 

(1) GENERAL INFORMATION: 

(i) APPLICANT: UNIVERSITY OF SASKATCHEWAN 

(ii) TITLE OF INVENTION: RECOMBINANT PROTEIN PRODUCTION IN BOVINE 

ADENOVIRUS EXPRESSION VECTOR SYSTEM 
(iii) NUMBER OF SEQUENCES: 34 
(ivj CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: SCOTT & AYLEN 

(B) STREET: 60 QUEEN STREET 

(C) CITY: OTTAWA 

(D) PROVINCE: ONTARIO 

(E) COUNTRY: CANADA 
|F)POSTAL CODE: K1P 5Y7 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.25 

(vi> CURRENT APPLICATION DATA: 
(A) APPLICATION NUMBER: 
IB) FILING DATE: 
(C) CLASSIFICATION: 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: JOAN M. VAN ZANT 

(B) REFERENCE/DOCKET NUMBER: PAT 21976TW-90 
(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 1-416-368-2400 

(B) TELEFAX: 1-416-363-7246 

(2) INFORMATION FOR SEQ ID NO:1: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 4060 base pairs 
IB) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: join(606..1215, 1323..1345) 
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:1: 

CATCATCAAT AATCTACAGT ACACTGATGG CAGCGGTCCA ACTGCCAATC ATTTTTGCCA 60 
CGTCATTTAT GACGCAACGA CGGCGAGCGT GGCGTGCTGA CGTAACTGTG GGGCGGAGCG 1 20 
CGTCGCGGAG GCGGCGGCGC TGGGCGGGGC TG AGGGCGGC GGGGGCGGCG CGCGGGGCGG 1 80 
CGCGCGGGGC GGGGCGAGGG GCGGAGTTCC GCACCCGCTA CGTCATTTTC AGACATTTTT 240 
TAGCAAATTT GCGCCTTTTG CAAGCATTTT TCTCACATTT CAGGTATTTA GAGGGCGGAT 300 
TTTTGGTGTT CGTACTTCCG TGTCACATAG TTCACTGTCA ATCTTCATTA CGGCTTAGAC 36.0 



AAATTTTCGG CGTCTTTTCC GGGTTTATGT CCCCGGTCAC CTTTATGACT GTGTGAAACA 420 
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CACCTGCCCA TTGTTTACCC TTGGTCAGTT TTTTCGTCTC CTAGGGTGGG AACATCAAGA 480 

ACAAATTTGC CGAGTAATTG TGCACCTTTT TCCGCGTTAG GACTGCGTTT CACACGTAGA 540 

CAGACTTTTT CTCATTTTCT CACACTCCGT CGTCCGCTTC AGAGCTCTGC GTCTTCGCTG 600 

CCACC ATG AAG TAG CTG GTC CTC GTT CTC AAC GAC GGC ATG AGT CGA 647 
Met Lys Tyr Leu Vat Leu Val Leu Asn Asp Gly Met Ser Arg 
5 15 10 

ATT GAA AAA GCT CTC CTG TGC AGC GAT GGT GAG GTG GAT TTA GAG TGT 695 
He Glu Lys Ala Leu Leu Cys Ser Asp Gly Glu Val Asp Leu Glu Cys 
15 20 25 30 

CAT GAG GTA CTT CCC CCT TCT CCC GCG CCT GTC CCC GCT TCT GTG TCA 743 
His Glu Val Leu Pro Pro Ser Pro Ala Pro Val Pro Ala Ser Val Ser 
35 40 45 

10 CCC GTG AGG AGT CCT CCT CCT CTG TCT CCG GTG TTT CCT CCG TCT CCG 791 
Pro Vat Arg Ser Pro Pro Pro Leu Ser Pro Val Phe Pro Pro Ser Pro 
50 55 60 

CCA GCC CCG CTT GTG AAT CCA GAG GCG AGT TCG CTG CTG CAG CAG TAT 839 
Pro Ala Pro Leu Val Asn Pro Glu Ala Ser Ser Leu Leu Gin Gin Tyr 
65 70 75 

CGG AGA GAG CTG TTA GAG AGG AGC CTG CTC CGA ACG GCC GAA GGT CAG 887 
Arg Arg Glu Leu Leu Glu Arg Ser Leu Leu Arg Thr Ala Glu Gly Gin 
15 80 85 90 

CAG CGT GCA GTG TGT CCA TGT GAG CGG TTG CCC GTG GAA GAG GAT GAG 935 
Gin Arg Ala Val Cys Pro Cys Glu Arg Leu Pro Vat Glu Glu Asp Glu 
95 100 105 110 

TGT CTG AAT GCC GTA AAT TTG CTG TTT CCT GAT CCC TGG CTA AAT GCA 9B3 
Cys Leu Asn Ala Val Asn Leu Leu Phe Pro Asp Pro Trp Leu Asn Ala 
115 120 125 

20 GCT GAA AAT GGG GGT GAT ATT TTT AAG TCT CCG GCT ATG TCT CCA GAA 1031 
Ala Glu Asn Gly Gly Asp lie Phe Lys Ser Pro Ala Met Ser Pro Glu 
130 135 140 

CCG TGG ATA GAT TTG TCT AGC TAC GAT AGC GAT GTA GAA GAG GTG ACT 1079 
Pro Trp lie Asp Leu Ser Ser Tyr Asp Ser Asp Val Glu Glu Vat Thr 
145 150 . 155 

AGT CAC TTT TTT CTG GAT TGC CCT GAA GAC CCC AGT CGG GAG TGT TCA 1127 
Ser His Phe Phe Leu Asp Cys Pro Glu Asp Pro Ser Arg Glu Cys Ser 
25 160 165 170 

TCT TGT GGG TTT CAT CAG GCT CAA AGC GGA ATT CCA GGC ATT ATG TGC 1175 
Ser Cys Gly Phe His Gin Ala Gin Ser Gly He Pro Gly lie Met Cys 
175 180 185 190 

AGT TTG TGC TAC ATG CGC CAA ACC TAC CAT TGC ATC TAT A GT AAG TAC AT 1225 
Ser Leu Cys Tyr Met Arg Gin Thr Tyr His Cys He Tyr 
195 200 

30 TCTGTAAAAG AACATCTTGG TGATTTCTAG GTATTGTTTA GGGATTAACT GGGTGGAGTG 1285 

ATCTTAATCC GGCATAACCA AATACATGTT T TCA CAG GT CCA GTT TCT GAA GAG 1339 

Ser Pro Val Ser Glu Glu 
205 

GAA ATG TGAGTCATGT TGACTTTGGC GCGCAAGAGG AAATGTGAGT CATGTTGACT 1395 

Glu Met 

210 

35 TTGGCGCGCC CTACGGTGAC TTTAAAGCAA TTTGAGGATC ACTTTTTTGT TAGTCGCTAT 1455 
AAAGTAGTCA CGGAGTCTTC ATGGATCACT TAAGCGTTCT TTTGGATTTG AAGCTGCTTC 1515 
GCTCTATCGT AGCGGGGGCT TCAAATCGCA CTGGAGTGTG GAAGAGGCGG CTGTGGCTGG 1575 
GACGCCTGAC TCAACTGGTC CATGATACCT GCGTAGAGAA CGAGAGCATA TTTCTCAATT 1635 



CTCTGCCAGG GAATGAAGCT TTTTTAAGGT TGCTTCGGAG CGGCTATTTT 6AAGTGTTTG 
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ACGTGTTTGT GGTGCCTGAG CTGCATCTGG ACACTCCGGG TCGAGTGGTC GCCGCTCTTG 1755 

CTCTGCTGGT GTTCATCCTC AACGATTTAG ACGCTAATTC TGCTTCTTCA GGCTTTGATT 1815 

CAGGTTTTCT CGTGGACCGT CTCTGCGTGC CGCTATGGCT GAAGGCCAGG GCGTTCAAGA 1875 

TCACCCAGAG CTCCAGGAGC ACTTCGCAGC CTTCCTCGTC GCCCGACAAG ACGACCCAGA 1935 

5 CTACCAGCCA GTAGACGGGG ACAGCCCACC CCGGGCTAGC CTGGAGGAGG CTGAACAGAG 1995 

CAGCACTCGT TTCGAGCACA TCAGTTACCG AGACGTGGTG GATGACTTCA ATAGATGCCA 2055 

TGATGTTTTT TATGAGAGGT ACAGTTTTGA GGACATAAAG AGCTACGAGG CTTTGCCTGA 2115 

GGACAATTTG GAGCAGCTCA TAGCTATGCA TGCTAAAATC AAGCTGCTGC CCGGTCGGGA 2175 

GTATGAGTTG ACTCAACCTT TGAACATAAC ATCTTGCGCC TATGTGCTCG GAAATGGGGC 2235 

10 TACTATTAGG GTAACAGGGG AAGCCTCCCC GGCTATTAGA GTGGGGGCCA TGGCCGTGGG 2295 

TCCGTGTGTA ACAGGAATGA CTGGGGTGAC TTTTGTGAAT TGTAGGTTTG AGAGAGAGTC 2355 

AACAATTAGG GGGTCCCTGA TACGAGCTTC AACTCACGTG CTGTTTCATG GCTGTTATTT 2415 

TATGGGAATT ATGGGCACTT GTATTGAGGT GGGGGCGGGA GCTTACATTC GGGGTTGTGA 2475 

GTTTGTGGGC TGTTACCGGG GAATCTGTTC TACTTCTAAC AGAGATATTA AGGTGAGGCA 2535 

15 GTGCAACTTT GACAAATGCT TACTGGGTAT TACTTGTAAG GGGGACTATC GTCTTTCGGG 2595 

AAATGTGTGT TCTGAGACTT TCTGCTTTGC TCATTTAGAG GGAGAGGGTT TGGTTAAAAA 2655 

CAACACAGTC AAGTCCCCTA GTCGCTGGAC CAGCGAGTCT GGCTTTTCCA TGATAACTTG 2715 

TGCAGACGGC AGGGTTACGC CTTTGGGTTC CCTCCACATT 6TGGGCAACC GTTGTAGGCG 2775 

TTGGCCAACC ATGCAGGGGA ATGTGTTTAT CATGTCTAAA CTGTATCTGG GCAACAGAAT 2835 

20 AGGGACTGTA GCCCTGCCCC AGTGTGCTTT CTACAAGTCC AGCATTTGTT TGGAGGAGAG 2895 

GGCGACAAAC AAGCTGGTCT TGGCTTGTGC TTTTGAGAAT AATGTACTGG TGTACAAAGT 2955 

GCTGAGACGG GAGAGTCCCT CAACCGTGAA AATGTGTGTT TGTGG6ACTT CTCATTATGC 3015 

AAAGCCTTTG ACACTGGCAA TTATTTCTTC AGATATTCGG GCTAATCGAT ACATGTACAC 3075 

TGTGGACTCA ACAGAGTTCA CTTCTGACGA GGATTAAAAG TGGGCGGGGC CAAGAGGGGT 3135 

25 ATAAATAGGT GGGGAGGTTG AGGGGAGCCG TAGTTTCTGT TTTTCCCAGA CTGGGGGGGA 3195 

CAACATGGCC GAGGAAGGGC GCATTTATGT GCCTTATGTA ACTGCCCGCC TGCCCAAGTG 3255 

GTCGGGTTCG GTGCAGGATA AGACGG6CTC GAACATGTTG GGGGGTGTGG TACTCCCTCC 3315 

TAATTCACAG GCGCACCGGA CGGAGACCGT GGGCACTGAG G CCA CCA GAG ACAACCTGCA 3375 

CGCCGAGGGA GCGCGTCGTC CTGAGGATCA GACGCCCTAC ATGATCTTGG TGGAGGACTC 3435 

3 0 TCTGGGAGGT TTGAAGAGGC GAATGGACTT GCTGGAAGAA TCTAATCAGC AGCTGCTGGC 3495 

AACTCTCAAC CGTCTCCGTA CAGGACTCGC TGCCTATGTG CAGGCTAACC TTGTGGGCGG 3555 

CCAAGTTAAC CCCTTTGTTT AAATAAAAAT ACACTCATAC AGTTTATTAT GCTGTCAATA 3615 

AAATTCTTTA TTTTTCCTGT GATAATACCG TGTCCAGCGT GCTCTGTCAA TAAGGGTCCT 3675 

ATGCATCCTG A6AAGGGCCT CATATACCCA TGGCATGAAT ATTAAGATAC ATGGGCATAA 3735 

35 GGCCCTCAGA AGGGTTGAGG TAGAGCCACT GCAGACTTTC GTGGGGAGGT AAGGTGTTGT 3795 

AAATAATCCA GTCATACTGA CTGTGCTGGG CGTGGAAGGA AAAGATGTCT TTTAGAAGAA 3855 

GGGTGATTGG CAAAGGGAGG CTCTTAGTGT AGGTATTGAT AAATCTGTTC AGTTGGGAGG 3915 

GATGCATTCG GGGGCTAATA AGGTGGAGTT TAGCCTGAAT CTTAAGGTTG GCAATGTTGC 3975 

CCCCTAGGTC TTTGCGAGGA TTCATGTTGT GCAGTACCAC AAAAACAGAG TAGCCTGTGC 4035 
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ATTTGGGGAA TTTATCATGA AGCTT 4060 

(2) INFORMATION FOR SEQ ID N0:2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 211 amino acids 

(B) TYPE: amino acid 
5 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

<Xi) SEQUENCE DESCRIPTION: SEQ 10 NO:2: 

Met Lys Tyr Leu Val Leu Val Leu Asn Asp Gly Met Ser Arg He GLu 
15 10 15 

Lys Ala Leu Leu Cys Ser Asp Gly Glu Val Asp Leu Glu Cys His Glu 
10 20 25 30 

Val Leu Pro Pro Ser Pro Ala Pro Val Pro Ala Ser Val Ser Pro Val 
35 40 45 

Arg Ser Pro Pro Pro Leu Ser Pro Val Phe Pro Pro Ser Pro Pro Ala 
50 55 60 

Pro Leu Val Asn Pro Glu Ala Ser Ser Leu Leu Gin Gin Tyr Arg Arg 
65 70 75 80 

Glu Leu Leu Glu Arg Ser Leu Leu Arg Thr Ala Glu Gly Gin Gin Arg 
85 90 95 

Ala Val Cys Pro Cys Glu Arg Leu Pro Val Glu Glu Asp Glu Cys Leu 
100 105 110 

Asn Ala Val Asn Leu Leu Phe Pro Asp Pro Trp Leu Asn Ala Ala Glu 
115 120 125 

20 Asn Gly Gly Asp He Phe Lys Ser Pro Ala Met Ser Pro Glu Pro Trp 
130 135 140 

He Asp Leu Ser Ser Tyr Asp Ser Asp Val Glu Glu Val Thr Ser His 
145 150 155 160 

Phe Phe Leu Asp Cys Pro Glu Asp Pro Ser Arg Glu Cys Ser Ser Cys 
165 170 175 

Gly Phe His Gin Ala Gin Ser Gly lie Pro Gly He Met Cys Ser Leu 
25 180 1S5 190 

Cys Tyr Met Arg Gin Thr Tyr His Cys lie Tyr Ser Pro Val Ser Glu 
195 200 205 

Glu Glu Met 
210 

(2) INFORMATION FOR SEQ ID NO:3: 

30 <i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4060 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

Cf i> MOLECULE TYPE: DNA (genomic) 

(ix) FEATURE: 
35 (A) NAME/KEY: CDS 

(B) LOCATION: 1476.. 1946 



(XI) SEQUENCE DESCRIPTION: SEQ ID N0:3: 
CATCATCAAT AATCTACAGT ACACTGATGG CAGCGGTCCA ACTGCCAATC ATTTTTGCCA 
CGTCATTTAT GACGCAACGA CGGCGAGCGT GGCGTGCTGA CGTAACTGTG GGGCGGAGCG 
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CGTCGCGGAG GCGGCGGCGC TGGGCGGGGC TGAGGGCGGC GGGGGCGGCG CGCGGGGCGG 180 

CGCGCGGGGC GGGGCGAGGG GCGGAGTTCC GCACCCGCTA CGTCATTTTC AGACATTTTT 240 

TAGCAAATTT GCGCCTTTTG CAAGCATTTT TCTCACATTT CAGGTATTTA GAGGGCGGAT 300 

TTTTGGTGTT CGTACTTCCG TGTCACATAG TTCACTGTCA ATCTTCATTA CGGCTTAGAC 360 

5 AAATTTTCGG CGTCTTTTCC GGGTTTATGT CCCCGGTCAC CTTTATGACT GTGTGAAACA 420 

CACCTGCCCA TTGTTTACCC TTGGTCAGTT TTTTCGTCTC CTAGGGTGGG AACATCAAGA 480 

ACAAATTTGC CGAGTAATTG TGCACCTTTT TCCGCGTTAG GACTGCGTTT CACACGTAGA 540 

CAGACTTTTT CTCATTTTCT CACACTCCGT CGTCCGCTTC AGAGCTCTGC GTCTTCGCTG 600 

CCACCATGAA GTACCTGGTC CTCGTTCTCA ACGACGGCAT GAGTCGAATT GAAAAAGCTC 660 

10 TCCTGTGCAG CGATGGTGAG GTGGATTTAG AGTGTCATGA GGTACTTCCC CCTTCTCCCG 720 

CGCCTGTCCC CGCTTCTGTG TCACCCGTGA GGAGTCCTCC TCCTCTGTCT CCGGTGTTTC 780 

CTCCGTCTCC GCCAGCCCCG CTTGTGAATC CAGAGGCGAG TTCGCTGCTG CAGCAGTATC 840 

GGAGAGAGCT GTTA6A6AGG AGCCTGCTCC GAACGGCCGA AGGTCAGCAG CGTGCAGTGT 900 

GTCCATGTGA GCGGTTGCCC GTGGAAGAGG ATGAGTGTCT GAATGCCGTA AATTTGCTGT 960 

15 TTCCTGATCC CTGGCTAAAT GCAGCTGAAA ATGGGGGTGA TATTTTTAAG TCTCCGGCTA 1020 

TGTCTCCAGA ACCGTGGATA GATTTGTCTA GCTACGATAG CGATGTAGAA GAGGTGACTA 1080 

GTCACTTTTT TCTGGATTGC CCTGAAGACC CCAGTCGGGA GTGTTCATCT TGTGGGTTTC 1140 

ATCAGGCTCA AAGCGGAATT CCAGGCATTA TGTGCAGTTT GTGCTACATG CGCCAAACCT 1200 

ACCATTGCAT CTATAGTAAG TACATTCTGT AAAAGAACAT CTTGGTGATT TCTAGGTATT 1260 

20 GTTTAGGGAT TAACTGGGTG GAGTGATCTT AATCCGGCAT AACCAAATAC ATGTTTTCAC 1320 

AGGTCCAGTT TCTGAAGAGG AAATGTGAGT CATGTTGACT TTGGCGCGCA AGAGGAAATG 1380 

TGAGTCATGT TGACTTTGGC GCGCCCTACG GTGACTTTAA AGCAATTTGA GGATCACTTT 1440 

TTTGTTAGTC GCTATAAAGT AGTCACGGAG TCTTC ATG GAT CAC TTA AGC GTT 1493 

Met Asp His Leu Ser Val 
1 5 

25 CTT TTG GAT TTG AAG CTG CTT CGC TCT ATC GTA GCG GGG GCT TCA AAT 1541 
teu Leu Asp Leu Lys Leu Leu Arg Ser lie Val Ala Gly Ala Ser Asn 
10 15 20 

CGC ACT GGA GTG TGG AAG AGG CGG CTG TGG CTG GGA CGC CTG ACT CAA 1589 
Arg Thr Gly Val Trp Lys Arg Arg Leu Trp Leu Gly Arg Leu Thr Gin 
25 30 35 

CTG GTC CAT GAT ACC TGC GTA GAG AAC GAG AGC ATA TTT CTC AAT TCT 1637 
Leu Val His Asp Thr Cys Val Glu Asn Glu Ser lie Phe Leu Asn Ser 
30 40 45 50 

CTG CCA GGG AAT GAA GCT TTT TTA AGG TTG CTT CGG AGC GGC TAT TTT 1685 
Leu Pro Gly Asn Glu Ala Phe Leu Arg Leu Leu Arg Ser Gly Tyr Phe 
55 60 65 70 

GAA GTG TTT GAC GTG TTT GTG GTG CCT GAG CTG CAT CTG GAC ACT CCG 1733 
Glu Val Phe Asp Val Phe Val Val Pro Glu Leu His Leu Asp Thr Pro 
75 80 85 

35 GGT CGA GTG GTC GCC GCT CTT GCT CTG CTG GTG TTC ATC CTC AAC GAT 1781 
Gly Arg Val Val Ala Ala Leu Ala Leu Leu Val Phe lie Leu Asn Asp 
90 95 100 

TTA GAC GCT AAT TCT GCT TCT TCA GGC TTT GAT TCA GGT TTT CTC CTG 1829 
Leu Asp Ala Asn Ser Ala Ser Ser Gly Phe Asp Ser Gly Phe Leu Val 
105 110 115 



GAC CGT CTC TGC GTG CCG CTA TGG CTG AAG GCC AGG GCG TTC AAG ATC 



1877 



10 



15 



WO 95/16048 PCT/CA94/00678 

-62- 

Asp Arg Leu Cys Val Pro Leu Trp Leu Lys Ala Arg Ala Phe Lys lie 
120 125 130 

ACC CAG AGC TCC AGG AGC ACT TCG CAG CCT TCC TCG TCG CCC GAC AAG 1925 
Thr Gin Ser Ser Arg Ser Thr Ser Gin Pro Ser Ser Ser Pro Asp Lys 
135 HO 145 150 

ACG ACC CAG ACT ACC AGC CAG TAGACGGGGA CAGCCCACCC CGGGCTAGCC 1976 
5 Thr Thr Gin Thr Thr Ser Gin 
155 

TGGAGGAGGC TGAACAGAGC AGCACTCGTT TCGAGCACAT CAGTTACCGA GACGTGGTGG 2036 

ATGACTTCAA TAGATGCCAT GATGTTTTTT ATGAGAGGTA CAGTTTTGAG GACATAAAGA 2096 

GCTACGAGGC TTTGCCTGAG GACAATTTGG AGCAGCTCAT AGCTATGCAT GCTAAAATCA 2156 

AGCTGCTGCC CGGTCGGGAG TATGAGTTGA CTCAACCTTT GAACATAACA TCTTGCGCCT 2216 

ATGTGCTCGG AAATGGGGCT ACTATTAGGG TAACAGGGGA AGCCTCCCCG GCTATTAGAG 2276 

TGGGGGCCAT GGCCGTGGGT CCGTGTGTAA CAGGAATGAC TGGGGTGACT TTTGTGAATT 2336 

GTAGGTTTGA GAGAGAGTCA ACAATTAGGG GGTCCCTGAT ACGAGCTTCA ACTCACGTGC 2396 

TGTTTCATGG CTGTTATTTT ATGGGAATTA TGGGCACTTG TATTGAGGTG GGGGCGGGAG 2456 

CTTACATTCG GGGTTGTGAG TTTGTGGGCT GTTACCGGGG AATCTGTTCT ACTTCTAACA 2516 

GAGATATTAA GGTGAGGCAG TGCAACTTTG ACAAATCCTT ACTGGGTATT ACTTGTAAGG 2576 

6GGACTATCG TCTTTCGGGA AATGTGTGTT CTGAGACTTT CTGCTTTGCT CATTTAGAGG 2636 

GAGAGGGTTT GGTTAAAAAC AACACAGTCA AGTCCCCTAG TCGCTGGACC AGCGAGTCTG 2696 

GCTTTTCCAT GATAACTTGT GCAGACGGCA GGGTTACGCC TTTGGGTTCC CTCCACATTG 2756 

TGGGCAACCG TTGTAGGCGT TGGCCAACCA TGCAGGGGAA TGTGTTTATC ATGTCTAAAC 2816 

TGTATCTGGG CAACAGAATA GGGACTGTAG CCCTGCCCCA GTGTGCTTTC TACAAGTCCA 2876 

GCATTTGTTT GGAGGAGAGG GCGACAAACA AGCTGGTCTT GGCTTGTGCT TTTGAGAATA 2936 

ATGTACTGGT GTACAAAGTG CTGAGACGGG AGAGTCCCTC AACCGTGAAA ATGTGTGTTT 2996 

GTG6GACTTC TCATTATGCA AAGCCTTTGA CACTGGCAAT TATTTCTTCA GATATTCGGG 3056 

CTAATCGATA CATGTACACT GTGGACTCAA CAGAGTTCAC TTCTGACGAG GATTAAAAGT 3116 

GGGCGGGGCC AAGAGGGGTA TAAATAGGTG GGGAGGTTGA GGGGAGCCGT AGTTTCTGTT 3176 

TTTCCCAGAC TGGGGGGGAC AACATGGCCG AGGAAGGGCG CATTTATGTG CCTTATGTAA 3236 

CTGCCCGCCT GCCCAAGTGG TCGGGTTCGG TGCAGGATAA GACGGGCTCG AACATGTTGG 3296 

GGGCTGTGGT ACTCCCTCCT AATTCACAGG CGCACCGGAC GGAGACCGTG GGCACTGAGG 3356 

CCACCAGAGA CAACCTGCAC GCCGAGGGAG CGCGTCGTCC T GAG GAT CAG ACGCCCTACA 3416 

TGATCTTGGT GGAGGACTCT CTGGGAGGTT TGAAGAGGCG AATGGACTTG CTGGAAGAAT 3476 

CTAATCAGCA GCTGCTGGCA ACTCTCAACC GTCTCCGTAC AGGACTCGCT GCCTATGTGC 3536 

AGGCTAACCT TGTGGGCGGC CAAGTTAACC CCTTTGTTTA AATAAAAATA CACTCATACA 3596 
GTTTATTATG CTGTCAATAA AATTCTTTAT TTTTCCTGTG ATAATACCGT GTCCAGCGTG • * 3656 

CTCTGTCAAT AAGGGTCCTA TGCATCCTGA GAAGGGCCTC ATATACCCAT GGCATGAATA 3716 

TTAAGATACA TGGGCATAAG GCCCTCAGAA GGGTTGAGGT AGAGCCACTG CAGACTTTCG 3776 

TGGGGAGGTA AGGTGTTGTA AATAATCCAG T CAT ACT GAC TGTGCTGGGC GTGGAAGGAA 3836 

AAGATGTCTT TTAGAAGAAG GGTGATTGGC AAAGGGAGGC TCTTAGTGTA GGTATTGATA 3896 

AATCTGTTCA GTTGGGAGGG ATGCATTCGG GGGCTAATAA GGTGGAGTTT AGCCTGAATC 3956 
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TTAAGGTTGG CAATGTTGCC CCCTAGGTCT TTGCGAGGAT TCATGTTGTG CAGTACCACA 4016 
AAAACAGAGT AGCCTGTGCA TTTGGGGAAT TTATCATGAA GCTT 4060 

(2) INFORKATION FOR SEQ ID N0:4: 

(i) SEQUENCE CHARACTERISTICS: 
5 (A) LENGTH: 157 amino acids 

(B) TYPE: amino acid 
CD) TOPOLOGY: linear 

(if) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ 10 WO:4: 

Het Asp His Leu Ser Val Leu Leu Asp Leu Lys Leu Leu Arg Ser He 
15 10 15 

Val Ala Gly Ala Ser Asn Arg Thr Gly Val Trp Lys Arg Arg Leu Trp 
20 25 30 

Leu Gly Arg Leu Thr Gin Leu Val His Asp Thr Cys Val Glu Asn Glu 
35 40 45 

Ser He Phe Leu Asn Ser Leu Pro Gly Asn Glu Ala Phe Leu Arg Leu 
50 55 60 

15 Leu Arg Ser Gly Tyr Phe Glu Val Phe Asp Val Phe Val Val Pro Glu 
65 70 75 80 

Leu His Leu Asp Thr Pro Gly Arg Val Val Ala Ala Leu Ala Leu Leu 
85 90 95 

Val Phe lie Leu Asn Asp Leu Asp Ala Asn Ser Ala Ser Ser Gly Phe 
100 105 110 

Asp Ser Gly Phe Leu Val Asp Arg Leu Cys Val Pro Leu Trp Leu Lys 
20 115 120 125 

Ala Arg Ala Phe Lys He Thr Gin Ser Ser Arg Ser Thr Ser Gin Pro 
130 135 140 

Ser Ser Ser Pro Asp Lys Thr Thr Gin Thr Thr Ser Gin 
145 150 155 

(2) INFORMATION FOR SEQ ID NO: 5: 

25 O*) SEQUENCE CHARACTERISTICS: 

(A> LENGTH: 4060 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 
CO) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(ix) FEATURE: 
30 (A) NAME/KEY: CDS 

(B) LOCATION: 1850.. 3109 

(xi) SEQUENCE DESCRIPTION: SEQ 10 NO:5: 

CATCATCAAT AATCTACAGT ACACTGATGG CAGCGGTCCA ACTGCCAATC ATTTTTGCCA 60 

CGTCATTTAT GACGCAACGA CGGCGAGCGT GGCGTGCTGA CGTAACTGTG GGGCGGAGCG 120 

35 CGTCGCGGAG GCGGCGGCGC TGGGCGGGGC TGAGGGCGGC GGGGGCGGCG CGCGGGGCGG 180 

CGCGCGGGGC GGGGCGAGGG GCGGAGTTCC GCACCCGCTA CGTCATTTTC AGACATTTTT 240 

TAGCAAATTT GCGCCTTTTG CAAGCATTTT TCTCACATTT CAGGTATTTA GA6GGCGGAT 300 

TTTTGGTGTT CGTACTTCCG TGTCACATAG TTCACTGTCA ATCTTCATTA CGGCTTAGAC 360 

AAATTTTCGG CGTCTTTTCC GGGTTTATGT CCCCGGTCAC CTTTATGACT GTGTGAAACA 420 
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CACCTGCCCA TTGTTTACCC TTGGTCAGTT TTTTCGTCTC CTAGGGTGGG AACATCAAGA 480 

ACAAATTTGC CGAGTAATTG TGCACCTTTT TCCGCGTTAG GACTGCGTTT CACACGTAGA 540 

CAGACTTTTT CTCATTTTCT CACACTCCGT CGTCCGCTTC AGAGCTCTGC GTCTTCGCTG 600 

CCACCATGAA GTACCTGGTC CTCGTTCTCA ACGACGGCAT GAGTCGAATT GAAAAAGCTC 660 

5 TCCTGTGCAG CGATGGTGAG GTGGATTTAG AGTGTCATGA GGTACTTCCC CCT7CTCCCG 720 

C6CCTGTCCC CGCTTCTGTG TCACCCGTGA GGAGTCCTCC TCCTCTGTCT CCGGTG7TTC 780 

CTCCGTCTCC GCCAGCCCCG CTTGTGAATC CAGAGGCGAG TTCGCTGCTG CAGCAGTATC 840 

GGAGAGAGCT GTTAGAGAGG AGCCTGCTCC GAACGGCCGA AGGTCAGCAG CGTGCAGTGT 900 

GTCCATGTGA GCGGTTGCCC GTGGAAGAGG ATGAGTGTCT GAATGCCGTA AATTTGCTGT 960 

10 TTCCTGATCC CTGGCTAAAT GCAGCTGAAA ATGGGGGTGA TATTTTTAAG TCTCCGGCTA 1020 

TGTCTCCAGA ACCGTGGATA GATTTGTCTA GCTACGATAG CGATGTAGAA GAGGTGACTA 1080 

GTCACTTTTT TCTGGATTGC CCTGAAGACC CCAGTCGGGA GTGTTCATCT TGTGGGTTTC 1140 

ATCAGGCTCA AAGCGGAATT CCAGGCATTA TGTGCAGTTT GTGCTACATG CGCCAAACC7 1200 

ACCATTGCAT CTATAGTAAG TACATTCTGT AAAAGAACAT CTTGGTGATT TCTAGGTATT 1260 

15 GTTTAGGGAT TAACTGGGTG GAGTGATCTT AATCCGGCAT AACCAAATAC ATGTTTTCAC 1320 

AGGTCCAGTT TCTGAAGAGG AAATGTGAGT CATGTTGACT TTGGCGCGCA AGAGGAAATG 1380 

TGAGTCATGT TGACTTTGGC GCGCCCTACG GTGACTTTAA AGCAATTTGA GGATCACTTT 1440 

TTTGTTAGTC GCTATAAAGT AGTCACGGAG TCTTCATGGA TCACTTAAGC GTTCTTTTGG 1500 

ATTTGAAGCT GCTTCGCTCT ATCGTAGCGG GGGCTTCAAA TCGCACTGGA GTGTGGAAGA 1560 

2 0 GGCGGCTGTG GCTGGGACGC CTGACTCAAC TGGTCCATGA TACCTGCGTA GAGAACGAGA 1620 

GCATATTTCT CAATTCTCTG CCAGGGAATG AAGCTTTTTT AAGGTTGCTT CGGAGCGGCT 1680 

ATTTTGAAGT GTTTGACGTG TTTGTGGTGC CTGAGCTGCA TCTGGACACT CCGGGTCGAG 1740 

TGGTCGCCGC TCTTGCTCTG CTGGTGTTCA TCCTCAACGA TTTA6ACGCT AATTCTGCTT 1800 

CTTCAGGCTT TGATTCAGGT TTTCTCGTGG ACCGTCTCTG CGTGCCGCT ATG GCT 1855 

Met Ala 

25 1 

GAA GGC CAG GGC GTT CAA GAT CAC CCA GAG CTC CAG GAG CAC TTC GCA 1903 
Glu Gly Gin Gly Vat Gin Asp His Pro Glu Leu Gin Glu His Phe Ala 
5 10 15 

GCC TTC CTC GTC GCC CGA CAA GAC GAC CCA GAC TAC CAG CCA GTA GAC 1951 
Ala Phe Leu Val Ala Arg Gin Asp Asp Pro Asp Tyr Gin Pro Val Asp 
20 25 30 

30 GGG GAC AGC CCA CCC CGG GCT AGC CTG GAG GAG GCT GAA CAG AGC AGC 1999 
Gly Asp Ser Pro Pro Arg Ala Ser Leu Glu Glu Ala Glu Gin Ser Ser 
35 40 45 50 

ACT CGT TTC CAG CAC ATC ACT TAC CGA GAC GTG GTG GAT GAC TTC AAT 2047 
Thr Arg Phe Glu His lie Ser Tyr Arg Asp Val Val Asp Asp Phe Asn 
.55 60 65 

AGA TGC CAT GAT GTT TTT TAT GAG AGG TAC AGT TTT GAG GAC ATA AAG 2095 
Arg Cys His Asp Val Phe Tyr Glu Arg Tyr Ser Phe Glu Asp He Lys 
35 70 75 80 

AGC TAC GAG GCT TTG CCT GAG GAC AAT TTG GAG CAG CTC ATA GCT ATG 2143 
Ser Tyr Glu Ala Leu Pro Glu Asp Asn Leu Glu Gin Leu He Ala Met 
85 90 95 

CAT GCT AAA ATC AAG CTG CTG CCC GGT CGG GAG TAT GAG TTG ACT CAA 2191 
His Ala Lys He Lys Leu Leu Pro Gly Arg Glu Tyr Glu Leu Thr Gin 
100 105 110 
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CCT TTG AAC ATA ACA TCT TGC GCC TAT GTG CTC GGA AAT GGG GCT ACT 2239 
Pro Leu Asn He Thr Ser Cys Ala Tyr Val Leu Gly Asn Gly Ala Thr 
115 120 125 130 

ATT AGG GTA ACA GGG GAA GCC TCC CCG GCT ATT AGA GTG GGG GCC ATG 2287 
lie Arg Val Thr Gly Glu Ala Ser Pro Ala tie Arg Val Gly Ala Met 
135 140 145 

5 GCC GTG GGT CCG TGT GTA ACA GGA ATG ACT GGG GTG ACT TTT GTG AAT 2335 
Ala Val Gly Pro Cys Val Thr Gly Met Thr Gly Val Thr Phe Val Asn 
150 155 160 

TGT AGG TTT GAG AGA GAG TCA ACA ATT AGG GGG TCC CTG ATA CGA GCT 2383 
Cys Arg Phe Glu Arg Glu Ser Thr lie Arg Gly Ser Leu lie Arg Ala 
165 170 175 

TCA ACT CAC GTG CTG TTT CAT GGC TGT TAT TTT ATG GGA ATT ATG GGC 2431 
Ser Thr His Val Leu Phe His Gly Cys Tyr Phe Met Gly lie Met Gly 
10 180 185 190 

ACT TGT ATT GAG GTG GGG GCG GGA GCT TAC ATT CGG GGT TGT GAG TTT 2479 
Thr Cys He Glu Val Gly Ala Gly Ala Tyr He Arg Gly Cys Glu Phe 
195 200 205 210 

GTG GGC TGT TAC CGG GGA ATC TGT TCT ACT TCT AAC AGA GAT ATT AAG 2527 
Val Gly Cys Tyr Arg Gly He Cys Ser Thr Ser Asn Arg Asp He Lys 
215 220 225 

15 GTG AGG CAG TGC AAC TTT GAC AAA TGC TTA CTG GGT ATT ACT TGT AAG 2575 
Val Arg Gin Cys Asn Phe Asp Lys Cys Leu Leu Gly lie Thr Cys Lys 
230 235 240 

GGG GAC TAT CGT CTT TCG GGA AAT GTG TGT TCT GAG ACT TTC TGC TTT 2623 
Gly Asp Tyr Arg Leu Ser Gly Asn Val Cys Ser Glu Thr Phe Cys Phe 
245 250 255 

GCT CAT TTA GAG GGA GAG GGT TTG GTT AAA AAC AAC ACA GTC AAG TCC 2671 
Ala His Leu Glu Gly Glu Gly Leu Val Lys Asn Asn Thr Val Lys Ser 
20 260 265 270 

CCT AGT CGC TGG ACC AGC GAG TCT GGC TTT TCC ATG ATA ACT TGT GCA 2719 
Pro Ser Arg Trp Thr Ser Glu Ser Gly Phe Ser Met He Thr Cys Ala 
275 280 Z85 290 

GAC GGC AGG GTT ACG CCT TTG GGT TCC CTC CAC ATT GTG GGC AAC CGT 2767 
Asp Gly Arg Val Thr Pro Leu Gly Ser Leu His tie Val Gly Asn Arg 
295 300 305 

25 TGT AGG CGT TGG CCA ACC ATG CAG GGG AAT GTG TTT ATC ATG TCT AAA 2815 
Cys Arg Arg Trp Pro Thr Met Gin Gly Asn Val Phe lie Met Ser Lys 
310 315 320 

CTG TAT CTG GGC AAC AGA ATA GGG ACT GTA GCC CTG CCC CAG TGT GCT 2863 
Leu Tyr Leu Gly Asn Arg He Gly Thr Val Ala Leu Pro Gin Cys Ala 
325 330 335 

TTC TAC AAG TCC AGC ATT TGT TTG GAG GAG AGG GCG ACA AAC AAG CTG 2911 
Phe Tyr Lys Ser Ser He Cys Leu Glu Glu Arg Ala Thr Asn Lys Leu 
3 0 340 345 350 

GTC TTG GCT TGT GCT TTT GAG AAT AAT GTA CTG GTG TAC AAA GTG CTG 2959 
Val Leu Ala Cys Ala Phe Glu Asn Asn Val Leu Val Tyr Lys Val Leu 
355 360 365 370 

AGA CGG GAG AGT CCC TCA ACC GTG AAA ATG TGT GTT TGT GGG ACT TCT 3007 
Arg Arg Glu Ser Pro Ser Thr Val Lys Met Cys Val Cys Gly Thr Ser 
375 380 385 

35 CAT TAT GCA AAG CCT TTG ACA CTG GCA ATT ATT TCT TCA GAT ATT CGG 3055 
His Tyr Ala Lys Pro Leu Thr Leu Ala He He Ser Ser Asp He Arg 
390 395 400 

GCT AAT CGA TAC ATG TAC ACT GTG GAC TCA ACA GAG TTC ACT TCT GAC 3103 
Ala Asn Arg Tyr Met Tyr Thr Val Asp Ser Thr Glu Phe Thr Ser Asp 
405 410 415 

GAG GAT TAAAAGTGGG CGGGGCCAAG AGGGGTATAA ATAGGTGGGG AGGTTGAGGG 3159 
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Gtu Asp 
420 

GAGCCGTAGT TTCTGTTTTT CCCAGACTGG GGGGGACAAC ATGGCCGAGG AAGGGCGCAT 3219 

TTATGTGCCT TATGTAACTG CCCGCCTGCC CAAGTGGTCG GGTTCGGTGC AGGATAAGAC 3279 

GGGCTCGAAC ATGTTGGGGG GTGTGGTACT CCCTCCTAAT TCACAGGCGC ACCGGACGGA 3339 

5 

GACCGTGGGC ACTGAGGCCA CCAGAGACAA CCTGCACGCC GAGGGAGCGC GTCGTCCTGA 3399 

GGATCAGACG CCCTACATGA TCTTGGTGGA GGACTCTCTG GGAGGTTTGA AGAGGCGAAT 3459 

GGACTTGCTG GAAGAATCTA ATCAGCAGCT GCTGGCAACT CTCAACCGTC TCCGTACAGG 3519 

ACTCGCTGCC TATGTGCAGG CTAACCTTGT GGGCGGCCAA GTTAACCCCT TTGTTTAAAT 3579 

AAAAATACAC TCATACAGTT TATTATGCTG TCAATAAAAT TCTTTATTTT 7CCTGTGATA 3639 

ATACCGTGTC CAGCGTGCTC TGTCAATAAG GGTCCTATGC ATCCTGAGAA GGGCCTCATA 3699 

TACCCATGGC ATGAATATTA AGATACATGG GCATAAGGCC CTCAGAAGGG TTGAGGTAGA 3759 

GCCACTGCAG ACTTTCGTGG GGAGGTAAGG TGTTGTAAAT AATCCAGTCA TACTGACTGT 3819 

GCTGGGCGTG GAAGGAAAAG ATGTCTTTTA GAAGAAGGGT GATTGGCAAA GGGAGGCTCT 3879 

TAGTGTAGGT ATTGATAAAT CTGTTCAGTT GGGAGGGATG CATTCGGGGG CTAATAAGGT 3939 

GGAGTTTAGC CTGAATCTTA AGGTTGGCAA TGTTGCCCCC TAGGTCTTTG CGAGGATTCA 3999 

TGTTGTGCAG TACCACAAAA ACAGAGTAGC CTGTGCATTT GGGGAATTTA TCATGAAGCT 4059 

T 4060 

(2) INFORMATION FOR SEQ ID N0:6: 

20 Ci) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 420 amino acids 

(B) TYPE: amino acid 
<D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: protein 

Cxi) SEQUENCE DESCRIPTION: SEQ 10 N0:6: 

Met Ala Gtu Gly Gin Gly Val Gin Asp His Pro Glu Leu Gin Glu His 
25 1 5 10 15 

Phe Ala Ala Phe Leu Val Ala Arg Gin Asp Asp Pro Asp Tyr Gin Pro 
20 25 30 

Val Asp Gly Asp Ser Pro Pro Arg Ala Ser Leu Glu Glu Ala Glu Gin 
35 40 45 



15 



30 



Ser Ser Thr Arg Phe Gtu His lie Ser Tyr Arg Asp Val Val Asp Asp 
50 55 60 

Phe Asn Arg Cys His Asp Val Phe Tyr Glu Arg Tyr Ser Phe Glu Asp 
65 70 75 80 

lie Lys Ser Tyr Glu Ala Leu Pro Glu Asp Asn Leu Glu Gin Leu lie 
85 90 95 

Ala Met His Ala Lys lie Lys Leu Leu Pro Gly Arg Glu Tyr Glu Leu 
100 105 110 

35 Thr Gin Pro Leu Asn lie Thr Ser Cys Ala Tyr Val Leu Gty Asn Gly 
115 120 125 

Ala Thr He Arg Val Thr Gly Gtu Ala Ser Pro Ala He Arg Val Gly 
130 135 140 

Ala Met Ala Val Gty Pro Cys Val Thr Gly Met Thr Gly Val Thr Phe 
145 150 155 160 
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Vat Asn Cys Arg Phe Glu Arg Glu Ser Thr lie Arg Gly Ser Leu He 
165 170 175 

Arg Ala Ser Thr His Val Leu Phe His Gly Cys Tyr Phe Met Gly lie 
180 185 190 

Met Gly Thr Cys lie Glu Val Gly Ala Gly Ala Tyr lie Arg Gly Cys 
195 200 205 

5 

GLu Phe Val Gly Cys Tyr Arg Gly He Cys Ser Thr Ser Asn Arg Asp 
210 215 220 

lie Lys Val Arg Gin Cys Asn Phe Asp Lys Cys Leu Leu Gly He Thr 
225 230 235 240 

Cys Lys Gly Asp Tyr Arg Leu Ser Gly Asn Val Cys Ser Glu Thr Phe 
2A5 250 255 

10 Cys Phe Ala His Leu Glu Gly Glu Gly Leu Val Lys Asn Asn Thr Val 
260 265 270 

Lys Ser Pro Ser Arg Trp Thr Ser Glu Ser Gly Phe Ser Met He Thr 
275 280 285 

Cys Ala Asp Gly Arg Val Thr Pro Leu Gly Ser Leu His lie Val Gly 
290 295 300 

Asn Arg Cys Arg Arg Trp Pro Thr Met Gin Gly Asn Val Phe He Met 
15 305 310 315 320 

Ser Lys Leu Tyr Leu Gly Asn Arg lie Gly Thr Val Ala Leu Pro Gin 
325 330 335 

Cys Ala Phe Tyr Lys Ser Ser lie Cys Leu Glu Glu Arg Ala Thr Asn 
340 345 350 



20 



Lys Leu Val Leu Ala Cys Ala Phe Glu Asn Asn Val Leu Val Tyr Lys 
355 360 365 

Val Leu Arg Arg Glu Ser Pro Ser Thr Val Lys Met Cys Val Cys Gly 
370 375 380 

Thr Ser His Tyr Ala Lys Pro Leu Thr Leu Ala He He Ser Ser Asp 
385 390 395 400 

He Arg Ala Asn Arg Tyr Met Tyr Thr Val Asp Ser Thr Glu Phe Thr 
405 410 415 

25 Ser Asp Glu Asp 
420 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 4060 base pairs 
CB) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



30 



35 



(ii> MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/KEY: COS 

(B) LOCATION: 3200.. 3574 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:7: 

CATCATCAAT AATCTACAGT ACACTGATGG CAGCGGTCCA ACTGCCAATC ATTTTTGCCA 60 

CGTCATTTAT GACGCAACGA CGGCGAGCGT GGCGTGCTGA CGTAACTGTG GGGCGGAGCG 120 

CGTCGCGGAG GCGGCGGCGC TGGGCGGGGC TGAGGGCGGC GGGGGCGGCG CGCGGGGCGG 180 

CGCGCGGGGC GGGGCGAGGG GCGGAGTTCC GCACCCGCTA CGTCATTTTC AGACATTTTT 240 
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TAGCAAATTT GCGCCTTTTG 
TTTTGGTGTT CGTACTTCCG 
AAATTTTCGG CGTCTTTTCC 
CACCTGCCCA TTGTTTACCC 
5 ACAAATTTGC CGAGTAATTG 
CAGACTTTTT CTCATTTTCT 
CCACCATGAA GTACCTGGTC 
TCCTGTGCAG CGATGGTGAG 
CGCCTGTCCC CGCTTCTGTG 

10 CTCCGTCTCC GCCAGCCCCG 
GGAGAGAGCT GTTAGAGAGG 
GTCCATGTGA GCGGTTGCCC 
TTCCTGATCC CTGGCTAAAT 
TGTCTCCAGA ACCGTGGATA 

15 GTCACTTTTT TCTGGATTGC 
ATCAGGCTCA AAGCGGAATT 
ACCATTGCAT CTATAGTAAG 
GTTTAGGGAT TAACTGGGTG 
AGGTCCAGTT TCTGAAGAGG 

20 TGAGTCATGT TGACTTTGGC 
TTTGTTAGTC GCTATAAAGT 
ATTTGAAGCT GCTTCGCTCT 
GGCGGCTGTG GCTGGGACGC 
GCATATTTCT CAATTCTCTG 

25 ATTTTGAAGT GTTTGAC6TG 
TGGTCGCCGC TCTTGCTCTG 
CTTCAGGCTT TGATTCAGGT 
CCAGGGCGTT CAAGATCACC 
ACAAGACGAC CCAGACTACC 

30 GGAGGCTGAA CAGAGCAGCA 
CTTCAATAGA TGCCATGATG 
CGAGGCTTTG CCTGAGGACA 
GCTGCCCGGT CGGGAGTATG 
GCTCGGAAAT GGGGCTACTA 

35 GGCCATGGCC GTGGGTCCGT 
GTTTGAGAGA GAGTCAACAA 
TCATGGCTGT TATTTTATGG 
CATTCGGGGT TGTGAGTTTG 
TATTAAGGTG AGGCAGTGCA 



-68- 

CAAGCATTTT TCTCACATTT CAGGTATTTA 
TGTCACATAG TTCACTGTCA ATCTTCATTA 
GGGTTTATGT CCCCGGTCAC CTTTATGACT 
TTGGTCAGTT TTTTCGTCTC CTAGGGTGGG 
TGCACCTTTT TCCGCGTTAG GACTGCGTTT 
CACACTCCGT CGTCCGCTTC AGAGCTCTGC 
CTCGTTCTCA ACGACGGCAT GAGTCGAATT 
GTGGATTTAG AGTGTCATGA GGTACTTCCC 
TCACCCGTGA GGAGTCCTCC TCCTCTGTCT 
CTTGTGAATC CAGAGGCGAG TTCGCTGCTG 
AGCCTGCTCC GAACGGCCGA AGGTCAGCAG 
GTGGAAGAGG ATGAGTGTCT GAATGCCGTA 
GCAGCTGAAA ATGGGGGTGA TATTTTTAAG 
GATTTGTCTA GCTACGATAG CGATGTAGAA 
CCTGAAGACC CCAGTCGGGA GTGTTCATCT 
CCAGGCATTA TGTGCAGTTT GTGCTACATG 
TACATTCTGT AAAAGAACAT CTTGGTGATT 
GAGTGATCTT AATCCGGCAT AACCAAATAC 
AAATGTGAGT CATGTTGACT TTGGCGCGCA 
GCGCCCTACG GTGACTTTAA AGCAATTTGA 
AGTCACGGAG TCTTCATGGA TCACTTAAGC 
ATCGTAGCGG GGGCTTCAAA TCGCACTGGA 
CTGACTCAAC TGGTCCATGA TACCTGCGTA 
CCAGGGAATG AAGCTTTTTT AAGGTTGCTT 
TTTGTGGTGC CTGAGCTGCA TCTGGACACT 
CTGGTGTTCA TCCTCAACGA TTTAGACGCT 
TTTCTCGTGG ACCGTCTCTG CGTGCCGCTA 
CAGAGCTCCA GGAGCACTTC GCAGCCTTCC 
AGCCAGTAGA CGGGGACAGC CCACCCCGGG 
CTCGTTTCGA GCACATCAGT TACCGAGACG 
TTTTTTATGA GAGGTACAGT TTTGAGGACA 
ATTTGGAGCA GCTCATAGCT ATGCATGCTA 
AGTTGACTCA ACCTTTGAAC ATAACATCTT 
TTAGGGTAAC AGGGGAAGCC TCCCCGGCTA 
GTGTAACAGG AATGACTGGG GTGACTTTTG 
TTAGGGGGTC CCTGATACGA GCTTCAACTC 
GAATTATGGG GACTTGTATT GAGGTGGGGG 
TGGGCTGTTA CCGGGGAATC TGTTCTACTT 
ACTTTGACAA ATGCTTACTG GGTATTACTT 
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GAGGGCGGAT 


300 


CGGCTTAGAC 


360 


GTGTGAAACA 


420 


AACATCAAGA 


4B0 


CACACGTAGA 


540 


GTCTTCGCTG 


600 


GAAAAAGCTC 


660 


CCTTCTCCCG 


720 


CCGGTGTTTC 


780 


CAGCAGTATC 


840 


CGTGCAGTGT 


900 


AATTTGCTGT 


960 


TCTCCGGCTA 


1020 


GAGGTGACTA 


1080 


TGTGGGTTTC 


1140 


CGCCAAACCT 


1200 


TCTAGGTATT 


1260 


ATGTTTTCAC 


1320 


AGAGGAAATG 


1380 


GGATCACTTT 


1440 


GTTCTTTTGG 


1500 


GTGTGGAAGA 


1560 


GAGAACGAGA 


1620 


CGGAGCGGCT 


1680 


CCGGGTCGAG 


1740 


AATTCTGCTT 


1800 


TGGCTGAAGG 


1860 


TCGTCGCCCG 


1920 


CTAGCCTGGA 


1980 


TGGTGGATGA 


2040 


TAAAGAGCTA 


2100 


AAATCAAGCT 


2160 


GCGCCTATGT 


2220 


TTAGAGTGGG 


2280 


TGAATTGTAG 


2340 


ACGTGCTGTT 


2400 


CGGGAGCTTA 


2460 


CTAACAGAGA 


2520 


GTAAGGGGGA 


2580 
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30 



35 



CTATCGTCTT 


TCGGGAAATG 


TGTGTTCTGA GACTTTCTGC 


TTTGCTCATT TAGAGGGAGA 


2640 


GGGTTTGGTT 


AAAAACAACA 


CAGTCAAGTC CCCTAGTCGC 


TGGACCAGCG AGTCTGGCTT 


2700 


TTCCATGATA 


ACTTGTGCAG 


ACGGCAGGGT TACGCCTTTG 


GGTTCCCTCC ACATTGTGGG 


2760 


CAACCGTTGT 


AGGCGTTGGC 


CAACCATGCA GGGGAATGTG 


TTTATCATGT CTAAACTGTA 


2820 


TCTGGGCAAC 


AGAATAGGGA 


CTGTAGCCCT GCCCCAGTGT 


GCTTTCTACA AGTCCAGCAT 


2880 


1 lit / 1 luLi/tu 


GAGAGGGCGA 


CAAACAAGCT GGTCTTGGCT 


TGTGCTTTTG AGAA7AATGT 


CTrHlr 


ACTGGTGTAC 


AAAGT6CTGA 


GACGGGAGAG TCCCTCAACC 


GTGAAAATGT GTGTTTGTGG 


3000 


GACTTCTCAT 


TATGCAAAGC 


CTTTGACACT GGCAATTATT 


TCTTCAGATA TTCGGGCTAA 


3060 


TCGATACATG 


TACACTGTGG 


ACTCAACAGA GTTCACTTCT 


GACGAGGATT AAAAGTGGGC 


3120 


GGGGCCAAGA 


GGGGTATAAA 


TAGGTGGGGA GGTTGAGGGG 


AGCCGTAGTT TCTGTTTTTC 


3180 


CCAGACTGGG 


GGGGACAAC ATG GCC GAG GAA GGG CGC ATT TAT GTG CCT TAT 


3232 
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CCAGACTGGG GGGGACAAC 

Met Ala Glu Giu Gly Arg lie Tyr Val Pro Tyr 
1 5 10 

GTA ACT CCC CGC CTG CCC AAG TGG TCG GGT TCG GTG CAG GAT AAG ACG 3280 
Val Thr Ala Arg Leu Pro Lys Trp Ser Gly Ser Val Gin Asp Lys Thr 
15 20 25 

15 GGC TCG AAC ATG TTG GGG GGT GTG GTA CTC CCT CCT AAT TCA CAG GCG 3328 
Gly Ser Asn Het leu Gly Gly Val Val Leu Pro Pro Asn Ser Gin Ala 
30 35 40 

CAC CGG ACG GAG ACC GTG GGC ACT GAG GCC ACC AGA GAC AAC CTG CAC 3376 
His Arg Thr Glu Thr Val Gly Thr Glu Ala Thr Arg Asp Asn Leu His 
45 50 55 

GCC GAG GGA GCG CGT CGT CCT GAG GAT CAG ACG CCC TAC ATG ATC TTG 3424 
Ala Glu Gly Ala Arg Arg Pro Glu Asp Gin Thr Pro Tyr Het He Leu 
20 60 65 70 75 

GTG GAG GAC TCT CTG GGA GGT TTG AAG AGG CGA ATG GAC TTG CTG GAA 3472 
Vat Glu Asp Ser Leu Gly Gly Leu Lys Arg Arg Net Asp Leu Leu Glu 
80 85 90 

GAA TCT AAT CAG CAG CTG CTG GCA ACT CTC AAC CGT CTC CGT ACA GGA 3520 
Glu Set Asn Gin Gin Leu Leu Ala Thr Leu Asn Arg Leu Arg Thr Gly 
95 100 105 

25 CTC GCT GCC TAT GTG CAG GCT AAC CTT GTG GGC GGC CAA GTT AAC CCC 3568 
Leu Ala Ala Tyr Val Gin Ala Asn Leu Val Gly Gly Gin Val Asn Pro 
110 115 120 



TTT GTT TAAATAAAAA TACACTCATA CAGTTTATTA TGCTGTCAAT AAAATTCTTT 
Phe Vat 
125 


3624 


ATTTTTCCTG 


TGATAATACC 


GTGTCCAGCG 


TGCTCTGTCA ATAAGGGTCC TATGCATCCT 


3684 


GAGAAGGGCC 


TCATATACCC 


ATGGCATGAA 


TATTAAGATA CATGGGCATA AGGCCCTCAG 


3744 


AAGGGTTGAG 


GTAGAGCCAC 


TGCAGACTTT 


CGTGGGGAGG TAAGGTGTTG TAAATAATCC 


3804 


AGTCATACTG 


ACTGTGCTGG 


GCGTGGAAGG 


AAAAGATGTC TTTTAGAAGA AGGGTGATTG 


3864 


GCAAAGGGAG 


GCTCTTAGTG 


TAGGTATTGA 


TAAATCTGTT CAGTTGGGAG GGATGCATTC 


3924 


GGGGGCTAAT 


AAGGTGGAGT 


TTAGCCTGAA 


TCTTAAGGTT GGCAATGTTG CCCCCTAGGT 


3984 


CTTTGCGAGG 


ATTCATGTTG 


TGCAGTACCA 


CAAAAACAGA GTAGCCTGTG CATTTGGGGA 


4044 


ATTTATCATG 


AAGCTT 






4060 



(2) INFORMATION FOR SEQ ID N0:8: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 125 amino acids 
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CB> TYPE: amino acid 
CD) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

<xi) SEQUENCE DESCRIPTION: SEQ ID N0:8: 

Met Ala Glu Glu Gly Arg lie Tyr Val Pro Tyr Val Thr Ala Arg Leu 
15 10 15 

Pro Lys Trp Ser Gly Ser Val Gin Asp Lys Thr Gly Ser Asn Met Leu 
20 25 30 

Gly Gly Val Val Leu Pro Pro Asn Ser Gin Ala His Arg Thr Glu Thr 
35 40 45 

Val Gly Thr Glu Ala Thr Arg Asp Asn Leu His Ala Glu Gly Ala Arg 
50 55 60 

Arg Pro Glu Asp Gin Thr Pro Tyr Het lie Leu Val Glu Asp Ser Leu 
65 70 75 80 

Gly Gly Leu Lys Arg Arg Met Asp Leu Leu Glu Glu Ser Asn Gin Gin 
85 90 95 

Leu Leu Ala Thr Leu Asn Arg Leu Arg Thr Gly Leu Ala Ala Tyr Val 
100 105 110 

15 Gin Ala Asn Leu Val Gly Gly Gin Val Asn Pro Phe Val 
115 120 125 

(2) INFORMATION FOR SEQ ID N0:9: 

<i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 54 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



10 



20 



(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:9: 

Glu Glu Phe Val Leu Asp Tyr Val Glu His Pro Gly His Gly Cys Arg 
1 5 10 15 

25 Ser Cys His Tyr His Arg Arg Asn Thr Gly Asp Pro Asp He Met Cys 

20 25 30 

Ser Leu Cys Tyr Met Arg Thr Cys Gly Met Phe Val Tyr Ser Pro Val 
35 40 45 

Ser Glu Pro Glu Pro Glu 
50 



30 



35 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO:1D: 

lie Asp Leu Thr Cys His Glu Ala Gly Phe Pro Pro Ser 
1 5 10 

(2) INFORMATION FOR SEQ ID NO:11: 

(f) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 19 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



5 

Cxi) SEQUENCE DESCRIPTION: SEQ ID NO:11: 

Leu Asp Phe Ser Thr Pro Gly Arg Ala Ala Ala Ala Val Ala Phe Leu 
15 10 15 

Ser Phe lie 



(2) INFORMATION FOR SEQ ID NO:12: 

10 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID N0:12: 

Gin Ser Ser Asn Ser Thr Ser 
1 5 

(2) INFORMATION FOR SEQ ID NO:13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 347 amino acids 
20 <B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

25 Gin Lys Tyr Ser lie Glu Gin Leu Thr Thr Tyr Trp Leu Gin Pro Gly 

15 10 15 

Asp Asp Phe Glu Glu Ala lie Arg Val Tyr Ala Lys Val Ala Leu Arg 
20 25 30 

Pro Asp Cys Lys Tyr Lys He Ser Lys Leu Val Asn lie Arg Asn Cys 
35 40 45 

Cys Tyr He Ser Gly Asn Gly Ala Glu Val Glu lie Asp Thr Glu Asp 
30 50 55 60 

Arg Val Ala Phe Arg Cys Ser Met lie Asn Met Trp Pro Gly Val Leu 
65 70 75 80 

Gly Met Asp Gly Val Val lie Met Asn Val Arg Phe Thr Gly Pro Asn 
85 90 95 



35 



Phe Ser Gly Thr Val Phe Leu Ala Asn Thr Asn Leu He Leu His Gly 
100 105 110 

Val Ser Phe Tyr Gly Phe Asn Asn Thr Cys Val Glu Ala Trp Thr Asp 
115 120 125 

Val Arg Val Arg Gly Cys Ala Phe Tyr Cys Cys Trp Lys Gly Val Val 
130 135 140 

Cys Arg Pro Lys Ser Arg Ala Ser He Lys Lys Cys Leu Phe Glu Arg 
145 150 155 UO 
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Cys Thr leu Gly He Leu Ser Glu Gly Asn Ser Arg Val Arg His Asn 
165 170 175 

Val Ala Ser Asp Cys Gly Cys Phe Met Leu Val Lys Sep Val Ala Val 
180 185 190 

lie Lys His Asn Met Val Cys Gly Asn Cys Glu Asp Arg Ala Ser Gin 
195 200 205 

5 

Met Leu Thr Cys Ser Asp Gly Asn Cys His Leu Leu Lys Thr He His 
210 215 220 

Val Ala Ser His Ser Arg Lys Ala Trp Pro Val Phe Glu His Asn He 
225 230 235 240 

Leu His Arg Cys Ser Leu His Leu Gly Asn Arg Arg Gly Val Phe Leu 
245 250 255 

1 o Pro Tyr Gin Cys Asn Leu Ser His Thr Lys He Leu Leu Glu Pro Glu 

±U 260 265 270 

Ser Met Ser Lys Val Asn Leu Asn Gly Val Phe Asp Met Thr Met Lys 
275 280 285 

lie Trp Lys Val Leu Arg Tyr Asp Glu Thr Arg Thr Arg Cys Arg Pro 
290 295 300 

Cys Glu Cys Gly Gly Lys His lie Arg Asn Gin Pro Val Met Leu Asp 
15 305 310 315 320 

Val Thr Glu Glu Leu Arg Pro Asp His Leu Val Leu Ala Cys His Arg 
325 330 335 

Ala Glu Phe Gly Ser Ser Asp Glu Asp Thr Asp 
340 345 

(2) INFORMATION FOR SEQ ID NO: 14: 

20 O) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 140 Bmino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



25 (xi) SEOUENCE DESCRIPTION: SEQ ID NO:14: 

Met Ser Thr Asn Ser Phe Asp Gly Ser lie Val Ser Ser Tyr Leu Thr 
! - 5 10 15 

Thr Arg Met Pro Pro Trp Ala Gly Val Arg Gin Asn Val Met Gly Ser 
20 25 30 

Ser He Asp Gly Arg Pro Val Leu Pro Ala Asn Ser Thr Thr Leu Thr 
35 40 45 

Tyr Glu Thr Val Ser Gly Thr Pro Leu Glu Thr Ala Ala Ser Ala Ala 
50 55 60 

Ala Ser Ala Ala Ala Ala Thr Ala Arg Gly He Val Thr Asp Phe Ala 
65 70 75 80 

Phe Leu Ser Pro Leu Ala Ser Ser Ala Ala Ser Arg Ser Ser Ala Arg 
85 90 95 

-5K Asp Asp Lys Leu Thr Ala Leu Leu Ala Gin Leu Asp Ser Leu Thr Arg 

100 105 110 

Glu Leu Asn Val Val Ser Gin Gin Leu Leu Asp Leu Arg Gin Gin Val 
115 120 125 

Ser Ala Leu Lys Ala Ser Ser Pro Pro Asn Ala Val 
130 135 HO 



30 
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(2) INFORMATION FOR SEQ 10 KO:15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5100 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

5 (ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/KEY: COS 

(B) LOCATION: 2.. 418 



(xi) SEQUENCE DESCRIPTION: SEQ 10 N0:15: 

10 C CTC ATC AAA CAA CCC GTG GTG GGC ACC ACC CAC GTG GAA ATG CCT 46 
Leu He Lys Gin Pro Val Val Gly Thr Thr His Val Glu Het Pro 
15 10 15 

CGC AAC GAA GTC CTA GAA CAA CAT CTG ACC TCA CAT GGC GCT CAA ATC 94 
Arg Asn Glu Val Leu Glu Gin His Leu Thr Ser His Gly Ala Gin lie 
20 25 30 

GCG GGC GGA GGC GCT GCG GGC GAT TAC TTT AAA AGC CCC ACT TCA GCT 142 
Ala Gly Gly Gly Ala Ala Gly Asp Tyr Phe Lys Ser Pro Thr Ser Ala 
15 35 40 45 

CGA ACC CTT ATC CCG CTC ACC GCC TCC TGC TTA AGA CCA GAT GGA GTC 190 
Arg Thr Leu He Pro Leu Thr Ala Ser Cys Leu Arg Pro Asp Gly Val 
50 55 60 

TTT CAA CTA GGA GGA GGC TCG CGT TCA TCT TTC AAC CCC CTG CAA ACA 238 
Phe Gin Leu Gly Gly Gly Ser Arg Ser Ser Phe Asn Pro Leu Gin Thr 
65 70 75 

20 GAT TTT GCC TTC CAC GCC CTG CCC TCC AGA CCG CGC CAC GGG GGC ATA 286 
Asp Phe Ala Phe His Ala Leu Pro Ser Arg Pro Arg His Gly Gly lie 
80 85 90 95 

GGA TCC AGG CAG TTT GTA GAG GAA TTT GTG CCC GCC CTC TAC CTC AAC 334 
Gly Ser Arg Gin Phe Val Glu Glu Phe Val Pro Ala Val Tyr Leu Asn 
100 105 110 

CCC TAC TCG GGA CCG CCG GAC TCT TAT CCG GAC CAG TTT ATA CGC CAC 382 
Pro Tyr Ser Gly Pro Pro Asp Ser Tyr Pro Asp Gin Phe lie Arg His 
25 115 120 125 

TAC AAC GTG TAC AGC AAC TCT GTG AGC GGT TAT AGC TGAGATTGTA 428 
Tyr Asn Val Tyr Ser Asn Ser Val Ser Gly Tyr Ser 
130 135 

AGACTCTCCT ATCTGTCTCT GTGCTGCTTT 

TTTCTGCTCA TCTTCAGCCT GCTTGTGCAT 

30 TTCTATGCTG CAAGGCCCGG GTCTGAGCCT 

GAGTCAGATT ACAACCCCAC CAdGGTTCTG 

ATCTCTGTTC TTTTCCGTCA CAACGGCTCC 

TTTACTGACC ACAACAGCAG CATTGTGGTG 

TCTAAGCTCT GCTGCTCATA CCGGCACAAC 

35 GACGTCCCTA CCTGTCACGA GCCCGGCAAG 

GGAACTGCCC ACCAAGCAGT CACTTCGTTT 

CGACCTTGGG GCAATGTAAC TTGGTTTTGT 

CTGAACTCCC TACTTATTTA CAACTTTTCT 

ATGCACTCCG GACCTGCTTC CCTCTTTCAG 



TCCGCTTCAA 


GCCCCACAAG 


CATGAAGGGG 


488 


TGTCCCCTAA 


TTCATGTTGG 


GACCATTAGC 


548 


AACGCGACTT 


ATGTTTGTGA 


CTATGGAAGC 


608 


TGGTTGGCTC 


GAGAGACCGA 


TGGCTCCTGG 


668 


TCAACTGCAG 


CCCCCGGGGT 


CGTCGCGCAC 


728 


CCCCAGTATT 


ACCTCCTCAA 


CAACTCACTC 


788 


GAGCGTTCTC 


AGTTTACCTG 


CAAACAAGCT 


848 


CCGCTCACCC 


TCCGCGTCTC 


CCCCGCGCTG 


908 


TTTCAAAATG 


TACCCATAGC 


TACTGTTTAC 


968 


CCTCCCTTCA 


TGTGTACCTT 


TAATGTCAGC 


1028 


GACAAAACCG 


GGGGGCAATA 


CACAGCTCTC 


1088 


CTCTTTAAGC 


CAACGACTTG 


TGTCACCAAG 


1148 
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GTGGAGGACC CGCCGTATGC CAACGACCCG GCCTCGCCTG TGTGGCGCCC ACTGCTTTTT 1208 
GCCTTCGTCC TCTGCACCGG CTGCGCGGTG TTGTTAACCG CCTTCGGTCC ATCGATTCTA 1268 
TCCGGTACCC GAAAGCTTAT CTCAGCCCGC TTTTGGAGTC CCGAGCCCTA TACCACCCTC 1328 
CACTAACAGT CCCCCCATGG AGCCAGACGG AGTTCATGCC GAGCAGCAGT TTATCCTCAA 1388 
5 TCAGATTTCC TGCGCCAACA CTGCCCTCCA GCGTCAAAGG GAGGAACTAG CTTCCCTTGT 1448 
CATGTTGCAT GCCTGTAAGC GTGGCCTCTT TTGTCCAGTC AAAACTTACA AGCTCAGCCT 1508 
CAACGCCTCG GCCAGCGAGC ACAGCCTGCA CTTTGAAAAA AGTCCCTCCC GATTCACCCT 1568 
GGTCAACACT CACGCCGGAG CTTCTGTGCG AGTGGCCCT A CACCACCAGG GAGCTTCCGG 1628 
CAGCATCCGC TGTTCCTGTT CCCACGCCGA GTGCCTCCCC GTCCTCCTCA AGACCCTCTG 1688 

10 TGCCTTTAAC TTTTTAGATT AgJtGAAAGC AAATATAAAA TGGTGTGCTT ACCGTAATTC 1748 
TGTTTTGACT TGTGTGCTTG ATTTCTCCCC CTGCGCCGTA ATCCAGTGCC CCTCTTCAAA 1808 
ACTCTCGTAC CCTATGCGAT TCGCATAGGC ATATTTTCTA AAAGCTCTGA AGTCAACATC 1868 
ACTCTCAAAC ACTTCTCCGT TGTAGGTTAC TTTCATCTAC AGATAAAGTC ATCCACCGGT 1928 
TAACATCATG AAGAGAAGTG TGCCCCAGGA CTTTAATCTT GTGTATCCGT ACAAGGCTAA 1988 

15 GAGGCCCAAC ATCATGCCGC CCTTTTTTGA CCGCAATGGC TTTGTTGAAA ACCAAGAAGC 2048 

CACGCTAGCC ATGCTTGTGG AAAAGCCGCT CACGTTCGAC AAGGAAGGTG CGCTGACCCT 2108 

GGGCGTCGGA CGCGGCATCC GCATTAACCC CGCGGGGCTT CTGGAGACAA ACGACCTCGC 2168 

GTCCGCTGTC TTCCCACCGC TGGCCTCCGA TGAGGCCGGC AACGTCACGC TCAACATGTC 2228 

TGACGGGCTA TATACTAAGG ACAACAAGCT AGCTGTCAAA GTAGGTCCCG GGCTGTCCCT 2288 

2 0 CGACTCCAAT AATGCTCTCC AGGTCCACAC AGGCGACGGG CTCACGGTAA CCGATGACAA 2348 

GGTGTCTCTA AATACCCAAG CTCCCCTCTC GACCACCAGC GCGGGCCTCT CCCTACTTCT 2408 

GGGTCCCAGC CTCCACTTAG GTGAGGAGGA ACGACTAACA GTAAACACCG GAGCGGGCCT 2468 

CCAAATTAGC AATAACGCTC TGGCCGTAAA AGTAGGTTCA GGTATCACCG TAGATGCTCA 2528 

AAACCAGCTC GCTGCATCCC TGGGGGACGG TCTAGAAAGC AGAGATAATA AAACTGTCGT 2588 

25 TAAGGCTGGG CCCGGACTTA CAATAACTAA TCAAGCTCTT ACTGTTGCTA CCGGGAACGG 2648 

CCTTCAGGTC AACCCGGAAG GGCAACTGCA GCTAAACATT ACTGCCGGTC AGGGCCTCAA 2708 

CTTTGCAAAC AACAGCCTCG CCGTGGAGCT GGGCTCGGGC CTGCATTTTC CCCCTGGCCA 2768 

AAACCAAGTA AGCCTTTATC CCGGAGATGG AATAGACATC CGAGATAATA GGGTGACTGT 2828 

GCCCGCTGGG CCAGGCCTGA GAATGCTCAA CCACCAACTT GCCGTAGCTT CCGGAGACGG 2888 

3 0 TTTAGAAGTC CACAGCGACA CCCTCCGGTT AAAGCTCTCC CACGGCCTGA CATTTGAAAA 2948 

TGGCGCCGTA CGAGCAAAAC TAGGACCAGG ACTTGGCACA GACGACTCTG GTCGGTCCGT 3008 

GGTTCGCACA GGTCGAGGAC TTAGAGTTGC AAACGGCCAA GTCCAGATCT TCAGCGGAAG 3068 

AGGCACCGCC ATCGGCACTG ATAGCAGCCT CACTCTCAAC ATCCGGGCGC CCCTACAATT 3128 

TTCTGGACCC GCCTTGACTG CTAGTTTGCA AGGCAGTGGT CCGATTACTT ACAACAGCAA 3188 

35 CAATGGCACT TTCGGTCTCT CTATAGGCCC CGGAATGTGG GTAGACCAAA ACAGACTTCA 3248 

GGTAAACCCA GGCGCTGGTT TAGTCTTCCA AGGAAACAAC CTTGTCCCAA ACCTTGCGGA 3308 

TCCGCTGGCT ATTTCCGACA GCAAAATTAG TCTCAGTCTC GGTCCCGGCC TGACCCAAGC 3368 

TTCCAACGCC CTGACTTTAA GTTTAGGAAA CGGGCTTGAA TTCTCCAATC AAGCCGTTGC 3428 

TATAAAAGCG GGCCGGGGCT TACGCTTTGA GTCTTCCTCA CAAGCTTTAG AGAGCAGCCT 3488 
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CACAGTCGGA AATGGCTTAA CGCTTACCGA TACTGT6ATC CGCCCCAACC TAGGGGACGG 3548 

CCTAGAGGTC AGAGACAATA AAATCATTGT TAAGCTGGGC GCGAATCTTC GTTTTGAAAA 3608 

CGGAGCCGTA ACCGCCGGCA CCGTTAACCC TTCTGCGCCC GAGGCACCAC CAACTCTCAC 3668 

TGCAGAACCA CCCCTCCGAG CCTCCAACTC CCATCTTCAA CTGTCCCTAT CGGAGGGCTT 3728 

GGTTGTGCAT AACAACGCCC TTGCTCTCCA ACTGGGAGAC GGCATGGAAG TAAATCAGCA 3788 

CGGACTTACT TTAAGAGTAG GCTCGGGTTT GCAAATGCGT GACGGCATTT TAACAGTTAC 3848 

ACCCAGCGGC ACTCCTATTG AGCCCAGACT GACTGCCCCA CTGACTCAGA CAGAGAATGG 3908 

AATCGGGCTC GCTCTCGGCG CCGGCTTGGA ATTAGACGAG AGCGCGCTCC AAGTAAAAGT 3968 

TGGGCCCGGC ATGCGCCTGA ACCCTGTAGA AAAGTATGTA ACCCTGCTCC TGGGTCCTGG 4028 

CCTTAGTTTT GGGCAGCCGG CCAACAGGAC AAATTATGAT GTGCGCGTTT CTGTGGAGCC 4088 

CCCCATGGTT TTCGGACAGC GTGGTCAGCT CACATTTTTA GTGGGTCACG GACTACACAT 4148 

TCAAAATTCC AAAC7TCAGC TCAATTTGGG ACAAGGCCTC AGAACTGACC CCGTCACCAA 4208 

CCAGCTGGAA GTGCCCCTCG GTCAAGGTTT GGAAATTGCA GACGAATCCC AGGTTAGGGT 4268 

TAAATTGGGC GATGGCCTGC AGTTTGATTC ACAAGCTCGC ATCACTACCG CTCCTAACAT 4328 

GGTCACTGAA ACTCTGTGGA CCGGAACAGG CAGTAATGCT AATGTTACAT GGCGGGGCTA 43S8 

CACTGCCCCC GGCAGCAAAC TCTTTTTGAG TCTCACTCGG TTCAGCACTG GTCTAGTTTT 4448 

AGGAAACATG ACTATTGACA GCAATGCATC CTTTGGGCAA TACATTAACG CGGGACACGA 4508 

ACAGATCGAA TGCTTTATAT TGTTGGACAA TCAGGGTAAC CTAAAAGAAG GATCTAACTT 4568 

GCAAGGCACT TGGGAAGTGA AGAACAACCC CTCTGCTTCC AAAGCTGCTT TTTTGCCTTC 4628 

CACCGCCCTA TACCCCATCC TCAACGAAAG CCGAGGGAGT CTTCCTGGAA AAAATCTTGT 4688 

GGGCATGCAA GCCATACTGG GAGGCGGGGG CACTTGCACT GT GAT AG CCA CCCTCAATGG 4748 

CAGACGCAGC AACAACTATC CCGCGGGCCA GTCCATAATT TTCGTGTGGC AAGAATTCAA 4808 

CACCATAGCC CGCCAACCTC TGAACCACTC TACACTTACT TTTTCTTACT GGACTTAAAT 4868 

AAGTTGGAAA TAAAGAGTTA AACTGAATGT TTAAGTGCAA CAGACTTTTA TTGGTTTTGG 4928 

CTCACAACAA ATTACAACAG CATAGACAAG TCATACCGGT CAAACAACAC AGGCTCTCGA 4988 

AAACGGGCTA ACCGCTCCAA GAATCTGTCA CGCAGACGAG CAAGTCCTAA ATGTTTTTTC 5048 

ACTCTCTTCG GGGCCAAGTT CAGCATGTAT CGGATTTTCT GCTTACACCT TT 5100 

(2) INFORMATION FOR SEQ ID N0:16: 

(i) SEQUENCE CHARACTERISTICS: 



(A) LENGTH: 139 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: Linear 



(ii) MOLECULE TYPE: protein 



(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 



Leu 



lie Lys Gin Pro Val Val Gly Thr Thr His Val Glu Met Pro Arg 
5 10 15 



Asn 



Glu Val Leu GLu Gin His Leu Thr Ser His Gly Ala Gin lie Ala 
20 25 30 



Gly 



Gly Gly Ala Ala Gly Asp Tyr Phe Lys Ser Pro Thr Ser Ala Arg 
35 40 45 



Thr 



Leu lie Pro Leu Thr Ala Ser Cys Leu Arg Pro Asp Gly Val Phe 
50 55 60 
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Gln Leu Gly Gly Gly Ser Arg Ser Ser Phe Asn Pro Leu Gin Thr Asp 
65 70 75 80 

Phe Ala Phe His Ala teu Pro Sep Arg Pro Arg His Gly Gly lie Gly 
85 90 95 

Ser Arg Gtn Phe Val Glu Glu Phe Val Pro Ala Val Tyr Leu Asn Pro 
100 105 110 

5 

Tyr Ser Gly Pro Pro Asp Ser Tyr Pro Asp Gin Phe He Arg His Tyr 
115 120 125 

Asn Val Tyr Ser Asn Ser Val Ser Gly Tyr Ser 
130 135 

(2) INFORMATION FOR SEQ 10 NO: 17: 

(t) SEQUENCE CHARACTER 1 ST 1 CS : 
10 CA) LENGTH: 5100 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 
CD) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(ix) FEATURE: 

(A) NAME/KEY: CDS 
15 (B) LOCATION: 408^1331 

Cxi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

CCTCATCAAA CAACCCGTGG TGGGCACCAC CCACGTGGAA ATGCCTCGCA ACGAAGTCCT 60 

AGAACAACAT CTGACCTCAC ATGGCGCTCA AATCGCGGGC GGAGGCGCTG CGGGCGATTA 120 

CTTTAAAAGC CCCACTTCAG CTCGAACCCT TATCCCGCTC ACCGCCTCCT GCTTAAGACC 180 

AGATGGAGTC TTTCAACTAG GAGGAGGCTC CCGTTCATCT TTCAACCCCC TGCAAACAGA 240 

TTTTGCCTTC CACGCCCTGC CCTCCAGACC GCGCCACGGG GGCATAGGAT CCAGGCAGTT 300 

TGTAGAGGAA TTTGTGCCCG CCGTCTACCT CAACCCCTAC TCGGGACCGC CGGACTCTTA 360 



20 



25 



TCCGGACCAG TTTATACGCC ACTACAACGT GTACAGCAAC TCTGTGA GCG GTT ATA 416 

Ala Val tie 
1 

GCT GAG ATT GTA AGA CTC TCC TAT CTG TCT CTG TGC TGC TTT TCC CCT 464 
Ala Glu He Val Arg Leu Ser Tyr Leu Ser Leu Cys Cys Phe Ser Ala 
5 10 15 

TCA AGC CCC ACA AGC ATG AAG GGG TTT CTG CTC ATC TTC AGC CTG CTT 512 
Ser Ser Pro Thr Ser Met Lys Gly Phe Leu Leu lie Phe Ser Leu Leu 
20 25 30 35 

GTG CAT TGT CCC CTA ATT CAT GTT GGG ACC ATT AGC TTC TAT GCT GCA 560 
30 Val His Cys Pro Leu lie His Vat Gly Thr lie Ser Phe Tyr Ala Ala 
40 45 50 

AGG CCC GGG TCT GAG CCT AAC GCG ACT TAT GTT TGT GAC TAT GGA AGC 608 
Arg Pro Gly Ser Glu Pro Asn Ala Thr Tyr Val Cys Asp Tyr Gly Ser 
55 60 65 

GAG TCA GAT TAC AAC CCC ACC ACG GTT CTG TGG TTG GCT CGA GAG ACC 656 
Glu Ser Asp Tyr Asn Pro Thr Thr Val Leu Trp Leu Ala Arg Glu Thr 
70 75 80 



35 



GAT GGC TCC TGG ATC TCT GTT CTT TTC CGT CAC AAC GGC TCC TCA ACT 704 
Asp Gly Ser Trp lie Ser Val Leu Phe Arg His Asn Gly Ser Ser Thr 
65 90 95 

GCA GCC CCC GGG GTC GTC GCG CAC TTT ACT GAC CAC AAC AGC AGC ATT 752 
Ala Ala Pro Gly Val Val Ala His Phe Thr Asp His Asn Ser Ser He 
100 105 110 115 
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GTG GTG CCC CAG TAT TAC CTC CTC AAC AAC TCA CTC TCT AAG CTC TGC 800 
Vat Vat Pro Gtn Tyr Tyr Leu Leu Asn Asn Ser Leu Ser Lys Leu Cys 
120 125 130 

TGC TCA TAC CGG CAC AAC GAG CGT TCT CAG TTT ACC TGC AAA CAA GCT B4B 
Cys Ser Tyr Arg His Asn Glu Arg Ser Gtn Phe Thr Cys Lys Gtn Ala 
135 140 145 

5 GAC GTC CCT ACC TGT CAC GAG CCC GGC AAG CCG CTC ACC CTC CGC 6TC 896 
Asp Val Pro Thr Cys His Glu Pro Gly Lys Pro Leu Thr Leu Arg Vol 
150 155 160 

TCC CCC GCG CTG GGA ACT GCC CAC CAA GCA GTC ACT TGG TTT TTT CAA 944 
Ser Pro Ala Leu Gly Thr Ala His Gtn Ala Val Thr Trp Phe Phe Gin 
165 170 175 

AAT GTA CCC ATA GCT ACT GTT TAC CGA CCT TGG GGC AAT GTA ACT TGG 992 
Asn VaL Pro lie Ala Thr Val Tyr Arg Pro Trp Gly Asn Val Thr Trp 
10 180 185 190 195 

TTT TGT CCT CCC TTC ATG TGT ACC TTT AAT GTC AGC CTG AAC TCC CTA 1040 
Phe Cys Pro Pro Phe Het Cys Thr Phe Asn Val Ser Leu Asn Ser Leu 
200 205 210 

CTT ATT TAC AAC TTT TCT GAC AAA ACC GGG GGG CAA TAC ACA GCT CTC 1088 
Leu lie Tyr Asn Phe Ser Asp Lys Thr Gly Gly Gtn Tyr Thr Ala Leu 
215 220 225 

15 ATG CAC TCC GGA CCT GCT TCC CTC TTT CAG CTC TTT AAG CCA ACG ACT 1136 
Het His Ser Gly Pro Ala Ser Leu Phe Gin Leu Phe Lys Pro Thr Thr 
230 235 240 

TGT GTC ACC AAG GTG GAG GAC CCG CCG TAT GCC AAC GAC CCG GCC TCG 1184 
Cys Val Thr Lys Val Glu Asp Pro Pro Tyr Ala Asn Asp Pro Ala Ser 
245 250 255 

CCT GTG TGG CGC CCA CTG CTT TTT GCC TTC GTC CTC TGC ACC GGC TGC 1232 
Pro Val Trp Arg Pro Leu Leu Phe Ala Phe Val Leu Cys Thr Gly Cys 
20 260 265 270 275 

GCG GTG TTG TTA ACC GCC TTC GGT CCA TCG ATT CTA TCC GGT ACC CGA 1280 
Ala Val Leu Leu Thr Ala Phe Gly Pro Ser lie Leu Ser Gly Thr Arg 
280 285 290 

AAG CTT ATC TCA GCC CGC TTT TGG AGT CCC GAG CCC TAT ACC ACC CTC 1328 
Lys Leu He Ser Ala Arg Phe Trp Ser Pro Glu Pro Tyr Thr Thr Leu 
295 300 305 

25 CAC TAACAGTCCC CCCATGGAGC CAGACGGAGT TCATGCCGAG CAGCAGTTTA 1381 
His 



30 



35 



TCCTCAATCA 


GATTTCCTGC 


GCCAACACTG 


CCCTCCAGCG 


TCAAAGGGAG 


GAACTAGCTT 


1441 


CCCTTGTCAT 


GTTGCATGCC 


TGTAAGCGTG 


GCCTCTTTTG 


TCCAGTCAAA 


ACTTACAAGC 


1501 


TCAGCCTCAA 


CGCCTCGGCC 


AGCGAGCACA 


GCCTGCACTT 


TGAAAAAAGT 


CCCTCCCGAT 


1561 


TCACCCTGGT 


CAACACTCAC 


GCCGGAGCTT 


CTGTGCGAGT 


GGCCCTACAC 


CACCAGGGAG 


1621 


CTTCCGGCAG 


CATCC6CTGT 


TCCTGTTCCC 


ACGCCGAGTG 


CCTCCCCGTC 


CTCCTCAAGA 


1681 


CCCTCTGTGC 


CTTTAACTTT 


TTAGATTAGC 


TGAAAGCAAA 


TATAAAATGG 


TGTGCTTACC 


1741 


GTAATTCTGT 


TTTGACTTGT 


GTGCTTGATT 


TCTCCCCCTG 


CGCCGTAATC 


CAGTGCCCCT 


1801 


CTTCAAAACT 


CTCGTACCCT 


ATGCGATTCG 


CATAGGCATA 


TTTTCTAAAA 


GCTCTGAAGT 


1861 


CAACATCACT 


CTCAAACACT 


TCTCGGTTGT 


AGGTTACTTT 


CATCTACAGA 


TAAAGTCATC 


1921 


CACCGGTTAA 


CATCATGAAG 


AGAAGTGTGC 


CCCAGGACTT 


TAATCTTGTG 


TATCCGTACA 


1981 


AGGCTAAGAG 


GCCCAACATC 


ATGCCGCCCT 


TTTTTGACCG 


CAATGGCTTT 


GTTGAAAACC 


2041 


AAGAAGCCAC 


GCTAGCCATG 


CTTGTGGAAA 


AGCCGCTCAC 


GTTCGACAAG 


GAAGGTGCGC 


2101 


TGACCCTGGG 


CGTCGGACGC 


GGCATCCGCA 


TTAACCCCGC 


GGGGCTTCTG 


GAGACAAACG 


2161 
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ACCTCGCGTC CGCTGTCTTC CCACCGCTGG CCTCCGATGA GGCCGGCAAC GTCACGCTCA 2221 

ACATGTCTGA CGGGCTATAT ACTAAGGACA ACAAGCTAGC TGTCAAAGTA GGTCCCGGGC 2281 

TGTCCCTCGA CTCCAATAAT GCTCTCCAGG TCCACACAGG CGACGGGCTC ACGGTAACCG 2341 

ATGACAAGGT GTCTCTAAAT ACCCAAGCTC CCCTCTCGAC CACCAGCGCG GGCCTCTCCC 2401 

5 TACTTCTGGG TCCCAGCCTC CACTTAGGTG AGGAGGAACG ACTAACAGTA AACACCGGAG 2461 

CGGGCCTCCA AATTAGCAAT AACGCTCT6G CCGTAAAAGT AGGTTCAGGT ATCACCGTAG 2521 

ATGCTCAAAA CCAGCTCGCT GCATCCCTGG GGGACGGTCT AGAAAGCAGA GATAATAAAA 2581 

CTGTCGTTAA GGCTGGGCCC GGACTTACAA TAACTAATCA AGCTCTTACT GTTGCTACCG 2641 

GGAACGGCCT TCAGGTCAAC CCGGAAGGGC AACTGCAGCT AAA CAT TACT GCCGGTCAGG 2701 

10 GCCTCAACTT TGCAAACAAC AGCCTCGCCG TGGAGCTGGG CTCGGGCCTG CATTTTCCCC 2761 

CTGGCCAAAA CCAAGTAAGC CTTTATCCCG GAGATGGAAT AGACATCCGA GATAATAGGG 2821 

TGACTGTGCC CGCTGGGCCA GGCCTGAGAA TGCTCAACCA CCAACTTGCC GTAGCTTCCG 2881 

GAGACGGTTT AGAAGTCCAC AGCGACACCC TCCGGTTAAA GCTCTCCCAC GGCCTGACAT 2941 

TTGAAAATGG CGCCGTACGA GCAAAACTAG GACCAGGACT TGGCACAGAC GACTCTGGTC 3001 

15 GGTCCGTGGT TCGCACAGGT CGAGGACTTA GAGTTGCAAA CGGCCAAGTC CAGATCTTCA 3061 

GCGGAAGAGG CACCGCCATC GGCACTGATA GCAGCCTCAC TCTCAACATC CGGGCGCCCC 3121 

TACAATTTTC TGGACCCGCC TTGACT6CTA GTTTGCAAGG CAGTGGTCCG ATTACTTACA 3181 

ACAGCAACAA T6GCACTTTC GGTCTCTCTA TAGGCCCCGG AATGTGGGTA GACCAAAACA 3241 

GACTTCAGGT AAACCCAGGC GCTGGTTTAG TCTTCCAAGG AAACAACCTT GTCCCAAACC 3301 

20 TTGCGGATCC GCTGGCTATT TCCGACAGCA AAATTAGTCT CAGTCTCGGT CCCGGCCTGA 3361 

CCCAAGCTTC CAACGCCCTG ACTTTAAGTT TAGGAAACGG GCTTGAATTC TCCAATCAAG 3421 

CCGTTGCTAT AAAAGCGGGC CGGGGCTTAC GCTTTGAGTC TTCCTCACAA GCTTTAGAGA 3481 

GCAGCCTCAC AGTCG6AAAT GGCTTAACGC TTACCGATAC TGTGATCCGC CCCAACCTAG 3541 

GGGACGGCCT AGAGGTCAGA GACAATAAAA TCATTGTTAA GCTGGGCGCG AATCTTCGTT 3601 

25 TTGAAAACGG AGCCGTAACC GCCGGCACCG TTAACCCTTC TGCGCCCGAG GCACCACCAA 3661 

CTCTCACTGC AGAACCACCC CTCCGAGCCT CCAACTCCCA TCTTCAACTG TCCCTATCGG 3721 

AGGGCTTGGT TGTGCATAAC AACGCCCTTG CTCTCCAACT GGGAGACGGC ATGGAAGTAA 3781 

ATCAGCACGG ACTTACTTTA AGAGTAGGCT CGGGTTTGCA AATGCGTGAC GGCATTTTAA 3841 

CAGTTACACC CAGCGGCACT CCTATTGAGC CCAGACTGAC TGCCCCACTG ACTCAGACAG 3901 

30 AGAATGGAAT CGGGCTCGCT CTCGGCGCCG GCTTGGAATT AGACGAGAGC GCGCTCCAAG 3961 

TAAAAGTTGG GCCCGGCATG CGCCTGAACC CTGTAGAAAA GTATGTAACC CTGCTCCTGG 4021 

GTCCTGGCCT TAGTTTTGGG CAGCCGGCCA ACAGGACAAA TTATGATGTG CGCGTTTCTG 40B1 

TGGAGCCCCC CATGGTTTTC GGACAGCGTG GTCAGCTCAC ATTTTTAGTG GGTCACGGAC 4141 

TACACATTCA AAATTCCAAA CTTCAGCTCA ATTTGGGACA AGGCCTCAGA ACTGACCCCG 4201 

35 TCACCAACCA GCTGGAAGTG CCCCTCGGTC AAGGTTTGGA AATTGCAGAC GAATCCCAGG 4261 

TTAGGGTTAA ATTGGGCGAT GGCCTGCAGT TTGATTCACA AGCTCGCATC ACTACCGCTC 4321 

CTAACATGGT CACTGAAACT CTGTGGACCG GAACAGGCAG TAATGCTAAT GTTACATGGC 4381 

GGGGCTACAC TGCCCCCGGC AGCAAACTCT TTTTGAGTCT CACTCGGTTC AGCACTGGTC 4441 

TAGTTTTAGG AAACATGACT ATTGACAGCA ATGCATCCTT TGGGCAATAC ATTAACGCGG 4501 
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GACACGAACA GATCGAATGC TTTATATTG7 TGGACAATCA GGGTAACCTA AAAGAAGGAT 4561 

CTAACTTGCA AGGCACTTGG GAAGTGAAGA ACAACCCCTC TGCTTCCAAA GCTGCTTTTT 4621 

TGCCTTCCAC CGCCCTATAC CCCATCCTCA ACGAAAGCCG AGGGAGTCTT CCTGGAAAAA 4681 

ATCTTGTGGG CATGCAAGCC ATACTGGGAG GCGGGGGCAC TTGCACTGTG ATAGCCACCC 4741 

5 TCAATGGCAG ACGCAGCAAC AACTATCCCG CGGGCCAGTC CATAATTTTC GTGTGGCAAG 4801 

AATTCAACAC CATAGCCCGC CAACCTCTGA ACCACTCTAC ACTTACTTTT TCTTACTGGA 4861 

CTTAAATAAG TTGGAAATAA AGAGTTAAAC TGAATGTTTA AGT6CAACAG ACTTTTATTG 4921 

GTTTTGGCTC ACAACAAATT ACAACAGCAT AGACAAGTCA TACC6GTCAA ACAACACAGG 4981 

CTCTCGAAAA CGGGCTAACC GCTCCAAGAA TCTGTCACGC AGACGAGCAA GTCCTAAATG 5041 

10 TTTTTTCACT CTCTTCGGGG CCAAGTTCAG CATGTATCGG ATTTTCTGCT TACACCTTT 5100 

(2) INFORMATION FOR SEQ ID NO:18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 303 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

15 Cii) MOLECULE TYPE: protein 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

Ala Val lie Ala Glu lie Val Arg Leu Ser Tyr Leu Ser Leu Cys Cys 
15 10 15 

Phe Ser Ala Ser Ser Pro Thr Ser Met Lys Gly Phe Leu Leu lie Phe 
20 25 30 

2 0 Ser Leu Leu Val His Cys Pro Leu He His Val Gly Thr lie Ser Phe 
35 40 45 

Tyr Ala Ala Arg Pro Gly Ser Glu Pro Asn Ala Thr Tyr Val Cys Asp 
50 55 60 

Tyr Gly Ser Glu Ser Asp Tyr Asn Pro Thr Thr Val Leu Trp Leu Ala 
65 70 75 80 

Arg Glu Thr Asp Gly Ser Trp lie Ser Val Leu Phe Arg His Asn Gly 
25 85 90 95 

Ser Ser Thr Ala Ala Pro Gly Val Val Ala His Phe Thr Asp His Asn 
100 105 110 

Ser Ser lie Val Val pro Gin Tyr Tyr Leu Leu Asn Asn Ser Leu Ser 
115 120 125 

Lys Leu Cys Cys Ser Tyr Arg His Asn Glu Arg Ser Gin Phe Thr Cys 
130 135 140 

Lys Gin Ala Asp Val Pro Thr Cys His Glu Pro Gly Lys Pro Leu Thr 
145 150 155 160 

Leu Arg Val Ser Pro Ala Leu Gly Thr Ala Hfs Gin Ala Val Thr Trp 
165 170 175 

Phe Phe Gin Asn Val Pro He Ala Thr Val Tyr Arg Pro Trp Gly Asn 
180 185 190 

35 Val Thr Trp Phe Cys Pro Pro Phe Met Cys Thr Phe Asn Val Ser Leu 
195 200 205 

Asn Ser Leu Leu lie Tyr Asn Phe Ser Asp Lys Thr Gly Gly Gin Tyr 
210 215 220 

Thr Ala Leu Net His Ser Gly Pro Ala Ser Leu Phe Gin Leu Phe Lys 
225 230 235 240 



30 
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Pro Thr Thr Cys Val Thr Lys Val Gtu Asp Pro Pro Tyr Ala Asn Asp 
245 250 255 

Pro Ala Ser Pro Val Trp Arg Pro Leu Leu Phe Ala Phe Val Leu Cys 
260 265 270 

Thr Gly Cys Ala Val Leu Leu Thr Ala Phe Gly Pro Ser lie Leu Ser 
275 280 285 

5 

Gly Thr Arg Lys Leu lie Ser Ala Arg Phe Trp Ser Pro Glu Pro Tyr 
290 295 300 

Thr Thr Leu His 
305 

(2) INFORMATION FOR SEQ ID N0:19: 

<i> SEQUENCE CHARACTERISTICS: 
10 (A) LENGTH: 5100 base pairs 

<B> TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME /KEY: COS 
15 (B) LOCATION: 529.. 954 



Cxi) SEQUENCE DESCRIPTION: SEQ ID NO: 19; 



CCTCATCAAA 


CAACCCGTGG 


TGGGCACCAC 


CCACGTGGAA ATGCCTCGCA 


ACGAAGTCCT 


60 


AGAACAACAT 


CTGACCTCAC 


ATGGCGCTCA 


AATCGCGGGC GGAGGCGCTG 


CGGGCGATTA 


120 


CTTTAAAAGC 


CCCACTTCAG 


CTCGAACCCT 


TATCCCGCTC ACCGCCTCCT 


GCTTAAGACC 


180 


AGATGGAGTC 


TTTCAACTAG 


GAGGAGGCTC 


GCGTTCATCT TTCAACCCCC 


TGCAAACAGA 


240 


TTTTGCCTTC 


CACGCCCTGC 


CCTCCAGACC 


GCGCCACGGG GGCATAGGAT 


CCAGGCAGTT 


300 


TGTAGAGGAA 


TTTGTGCCCG 


CCGTCTACCT 


CAACCCCTAC TCGGGACCGC 


CGGACTCTTA 


360 


TCCGGACCAG 


TTTATACGCC 


ACTACAACGT 


GTACAGCAAC TCTGTGAGCG 


GTTATAGCTG 


420 


AGATTGTAAG 


ACTCTCCTAT 


CTGTCTCTGT 


GCTGCTTTTC CGCTTCAAGC 


CCCACAAGCA 


480 


TGAAGGGGTT 


TCTGCTCATC 


TTCAGCCTGC 


TTGTGCATTG TCCCCTAA TTC ATG TTG 


537 



Phe Met Leu 
1 

GGA CCA TTA GCT TCT ATG CTG CAA GGC CCG GGT CTG AGC CTA ACG CGA 585 
Gly Pro Leu Ala Ser Met Leu Gin Gly Pro Gly Leu Ser Leu Thr Arg 
5 10 15 

CTT ATG TTT GTG ACT ATG GAA GCG AGT CAG ATT ACA ACC CCA CCA CGG 633 
30 Leu Met Phe Val Thr Met Glu Ala Ser Gin lie Thr Thr Pro Pro Arg 
20 25 30 35 

TTC TGT GGT TGG CTC GAG AGA CCG ATG GCT CCT GGA TCT CTG TTC TTT 681 
Phe Cys Gly Trp Leu Glu Arg Pro Met Ala Pro Gly Ser Leu Phe Phe 
40 45 50 

TCC GTC ACA ACG GCT CCT CAA CTG CAG CCC CCG GGG TCG TCG CGC ACT . . 729 
Ser Val Thr Thr Ala Pro Gin Leu Gin Pro Pro Gly Ser Ser Arg Thr 
55 60 65 

35 

TTA CTG ACC ACA ACA GCA CCA TTG TGG TGC CCC AGT ATT ACC TCC TCA 777 
Leu Leu Thr Thr Thr Ala Ala Leu Trp Cys Pro Ser lie Thr Ser Ser 
70 75 80 



ACA ACT CAC TCT CTA AGC TCT GCT GCT CAT ACC GGC ACA ACG AGC GTT 
Thr Thr His Ser Leu Ser Ser Ala Ala His Thr Gly Thr Thr Ser Val 
85 90 95 



825 
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CTC AGT TTA CCT GCA AAC AAG CTG ACG TCC CTA CCT GTC ACG AGC CCG 873 
Leu Ser Leu Pro Ala Asn Lys Leu Thr Ser Leu Pro Vat Thr Ser Pro 
100 105 110 115 

GCA AGC CGC TCA CCC TCC GCG TCT CCC CCG CGC TGG GAA CTG CCC ACC 9Z1 
Ala Ser Arg Ser Pro Ser Ala Ser Pro Pro Arg Trp Glu Leu Pro Thr 
120 125 130 

5 AAG CAG TCA CTT GGT TTT TTC AAA ATG TAC CCA TAGCTACTGT TTACCGACCT 974 
Lys Gin Ser Leu Gly Phe Phe Lys Met Tyr Pro 
135 HO 

TGGGGCAATG TAACTTGGTT TTGTCCTCCC TTCATGTGTA CCTTTAATGT CAGCCTGAAC 1034 

TCCCTACTTA TTTACAACTT TTCTGACAAA ACCGGGGGGC AATACACAGC TCTCATGCAC 1094 

TCCGGACCTG CTTCCCTCTT TCAGCTCTTT AAGCCAACGA CTTGTGTCAC CAAGGTGGAG 1154 

10 GACCCGCCGT ATGCCAACGA CCCGGCCTCG CCTGTGTGGC GCCCACTGCT TTTTGCCTTC 1214 

GTCCTCTGCA CCGGCTGCGC GGTGTTGTTA ACCGCCTTCG GTCCATCGAT TCTATCCGGT 1274 

ACCCGAAAGC TTATCTCAGC CCGCTTTTGG AGTCCCGAGC CCTATACCAC CCTCCACTAA 1334 

CAGTCCCCCC ATGGAGCCAG ACGGAGTTCA TGCCGAGCAG CAGTTTATCC TCAATCAGAT 1394 

TTCCTGCGCC AACACTGCCC TCCAGCGTCA AAGGGAGGAA CTAGCTTCCC TTGTCATGTT 1454 

15 GCATGCCTGT AAGCGTGGCC TCTTTTGTCC AGTCAAAACT TACAAGCTCA GCCTCAACGC 1514 

CTCGGCCAGC GAGCACAGCC TGCACTTTGA AAAAAGTCCC TCCCGATTCA CCCTGGTCAA 1574 

CACTCACGCC GGAGCTTCTG TGCGAGTGGC CCTACACCAC CAGGGAGCTT CCGGCAGCAT 1634 

CCGCTGTTCC TGTTCCCACG CCGAGTGCCT CCCCGTCCTC CTCAAGACCC TCTGTGCCTT 1694 

TAACTTTTTA GATTAGCTGA AAGCAAATAT AAAATGGTGT GCTTACCGTA ATTCTGTTTT 1754 

20 GACTTGTGTG CTTGATTTCT CCCCCTGCGC CGTAATCCAG TGCCCCTCTT CAAAACTCTC 1814 

GTACCCTATG CGATTCGCAT AGGCATATTT TCTAAAAGCT CTGAAGTCAA CATCACTCTC 1874 

AAACACTTCT CCGTTGTAGG TTACTTTCAT CTACAGATAA AGTCATCCAC CGGTTAACAT 1934 

CATGAAGAGA AGTGTGCCCC AGGACTTTAA TCTTGTGTAT CCGTACAAGG CTAA6AGGCC 1994 

CAACATCATG CCGCCCTTTT TTGACCGCAA TGGCTTTGTT GAAAACCAAG AAGCCACGCT 2054 

25 AGCCATGCTT GTGGAAAAGC CGCTCACGTT CGACAAGGAA GGTGCGCTGA CCCTGGGCGT 2114 

CGGACGCGGC ATCCGCATTA ACCCCGCGGG GCTTCTGGAG ACAAACGACC TCGCGTCCGC 2174 

TGTCTTCCCA CCGCTGGCCT CCGATGAGGC CGGCAACGTC ACGCTCAACA TGTCTGACGG 2234 

GCTATATACT AAGGACAACA AGCTAGCTGT CAAAGTAGGT CCCGGGCTGT CCCTCGACTC 2294 

CAATAATGCT CTCCAGGTCC ACACAGGCGA CGGGCTCACG GTAACCGATG ACAAGGTGTC 2354 

30 TCTAAATACC CAAGCTCCCC TCTCGACCAC CAGCGCGGGC CTCTCCCTAC TTCTGGGTCC 2414 

CAGCCTCCAC TTAGGTGAGG AGGAACGACT AACAGTAAAC ACCGGAGCGG GCCTCCAAAT 2474 

TAGCAATAAC GCTCTGGCCG TAAAAGTAGG TTCAGGTATC ACCGTA6ATG CTCAAAACCA 2534 

GCTCGCTGCA TCCCTGGGGG ACGGTCTAGA AAGCAGAGAT AATAAAACTG TCGTTAAGGC 2594 

TGGGCCCGGA CTTACAATAA CTAATCAAGC TCTTACTGTT GCTACCGGGA ACGGCCTTCA 2654 

35 GGTCAACCCG GAAGGGCAAC TGCAGCTAAA CATTACTGCC GGTCAGGGCC TCAACTTTGC 2714 

AAACAACAGC CTCGCCGTGG AGCTGGGCTC GGGCCTGCAT TTTCCCCCTG GCCAAAACCA 2774 

AGTAAGCCTT TATCCCGGAG ATGGAATAGA CATCCGAGAT AATAGGGTGA CTGTGCCCGC 2834 

TGGGCCAGGC CTGAGAATGC TCAACCACCA ACTTGCCGTA GCTTCCGGAG ACGGTTTAGA 2894 

AGTCCACAGC GACACCCTCC GGTTAAAGCT CTCCCACGGC CTGACATTTG AAAATGGCGC 2954 
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CGTACGAGCA 


AAACTAGGAC 


CAGGACTTGG 


CACAGACGAC 


TCTG6TCGGT CCGTGGTTCG 


3014 




CACAGGTCGA 


GGACTTAGAG 


TTGCAAACGG 


CCAAGTCCAG 


ATCTTCAGCG GAAGAGGCAC 


3074 




CGCCATCGGC 


ACTGATAGCA 


GCCTCACTCT 


CAACATCCGG 


GCGCCCCTAC AATTTTCTGG 


3134 




ACCCGCCTTG 


ACTGCTAGTT 


TGCAAGGCAG 


TGGTCCGATT 


ACTTACAACA GCAACAATGG 


3194 


5 


CACTTTCGGT 


CTCTCTATAG 


GCCCCGGAAT 


GTGGGTAGAC 


CAAAACAGAC TTCAGGTAAA 


3254 




CCCAGGCGCT 


GGTTTAGTCT 


TCCAAGGAAA 


CAACCTTGTC 


CCAAACCTTG CGGATCCGCT 


3314 




GGCTATTTCC 


GACAGCAAAA 


TTAGTCTCAG 


TCTCGGTCCC 


GGCCTGACCC AAGCTTCCAA 


3374 




CGCCCTGACT 


TTAAGTTTAG 


GAAACGGGCT 


TGAATTCTCC 


AATCAAGCCG TTGCTATAAA 


3434 




AGCGGGCCGG 


GGCTTACGCT 


TTGAGTCTTC 


CTCACAAGCT 


TTAGAGAGCA CCCTCACAGT 


3494 


10 


CGGAAATGGC 


TTAACGCTTA 


CCGATACTGT 


GATCCGCCCC 


AACCTAGGGG ACGGCCTAGA 


3554 




GGTCAGAGAC 


AATAAAATCA 


TTGTTAAGCT 


GGGCGCGAAT 


CTTCGTTTTG AAAACGGAGC 


3614 




CGTAACCGCC 


GGCACCGTTA 


ACCCTTCTGC 


GCCCGAGGCA 


CCACCAACTC TCACTGCAGA 


3674 




ACCACCCCTC 


CGAGCCTCCA 


ACTCCCATCT 


TCAACTGTCC 


CTATCGGAGG GCTTGGTTGT 


3734 




GCATAACAAC 


GCCCTTGCTC 


TCCAACTGGG 


AGACGGCATG 


GAAGTAAATC AGCACGGACT 


3794 


15 


TACTTTAAGA 


GTAGGCTCGG 


GTTTGCAAAT 


GCGTGACGGC 


ATTTTAACAG TTACACCCAG 


3854 




CGGCACTCCT 


ATTGAGCCCA 


GACTGACTGC 


CCCACTGACT 


CAGACAGAGA ATGGAATCGG 


3914 




GCTCGCTCTC 


GGCGCCGGCT 


TGGAATTAGA 


CGAGAGCGCG 


CTCCAAGTAA AAGTTGGGCC 


3974 




CGGCATGCGC 


CTGAACCCTG 


TAGAAAAGTA 


TGTAACCCTG 


CTCCTGGGTC CTGGCCTTAG 


4034 




TTTTGGGCAG 


CCGGCCAACA 


GGACAAATTA 


TGATGTGCGC 


GTTTCTGTGG AGCCCCCCAT 


4094 


20 


GGTTTTCGGA 


CAGCGTGGTC 


AGCTCACATT 


TTTAGTGGGT 


CACGGACTAC ACATTCAAAA 


4154 




TTCCAAACTT 


CAGCTCAATT 


TGGGACAAGG 


CCTCAGAACT 


GACCCCGTCA CCAACCAGCT 


4214 




GGAAGTGCCC 


CTCGGTCAAG 


GTTTGGAAAT 


TGCAGACGAA 


TCCCAGGTTA GGGTTAAATT 


4274 




GGGCGATGGC 


CTGCAGTTTG 


ATTCACAAGC 


TCGCATCACT 


ACCGCTCCTA ACATGGTCAC 


4334 




TGAAACTCTG 


TGGACCGGAA 


CAGGCAGTAA 


TGCTAATGTT 


AGATGGCGGG GCTACACTGC 


4394 


25 


CCCCGGCAGC 


AAACTCTTTT 


TGAGTCTCAC 


TCGGTTCAGC 


ACTGGTCTAG TTTTAGGAAA 


4454 




CATGACTATT 


GACAGCAATG 


CATCCTTTGG 


GCAATACATT 


AACGCGGGAC ACGAACAGAT 


4514 




CGAATGCTTT 


ATATTGTTGG 


ACAATCAGGG 


TAACCTAAAA 


GAAGGATCTA ACTTGCAAGG 


4574 




CACTTGGGAA 


GTGAAGAACA 


ACCCCTCTGC 


TTCCAAAGCT 


GCTTTTTTGC CTTCCACCGC 


4634 




CCTATACCCC 


ATCCTCAACG 


AAAGCCGAGG 


GAGTCTTCCT 


GGAAAAAATC TTGTGGGCAT 


4694 


30 


GCAAGCCATA 


CTGGGAGGCG 


GGGGCACTTG 


CACTGTGATA 


GCCACCCTCA ATGGCAGACG 


4754 




CAGCAACAAC 


TATCCCGCGG 


GCCAGTCCAT 


AATTTTCGTG 


TGGCAAGAAT TCAACACCAT 


4814 




AGCCCGCCAA 


CCTCTGAACC 


ACTCTACACT 


TACTTTTTCT 


TACTGGACTT AAATAAGTTG 


4874 




GAAATAAAGA 


GTTAAACTGA 


ATGTTTAAGT 


GCAACAGACT 


TTTATTGGTT TTGGCTCACA 


4934 




ACAAATTACA 


ACAGCATAGA 


CAAGTCATAC 


CGGTCAAACA 


ACACAGGCTC TCGAAAACGG 


4994 


35 


GCTAACCGCT 


CCAAGAATCT 


GTCACGCAGA 


CGAGCAAGTC 


CTAAATGTTT TTTCACTCTC 


5054 




TTCGGGGCCA 


AGTTCAGCAT 


GTATCGGATT 


TTCTGCTTAC 


ACCTTT 


5100 



(2) INFORMATION FOR SEQ ID N0:20: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 142 amino acids 
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(B) TYPE: amino acid 
CD) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO;20: 

Phe Met Leu Gly Pro Leu Mb Ser Met Leu Gin Gly Pro Gly Leu Ser 
1 5 10 15 

Leu Thr Arg Leu Met Phe Val Thr Met Glu Ala Ser Gin lie Thr Thr 
20 25 30 

Pro Pro Arg Phe Cys Gly Trp Leu Glu Arg Pro Met Ala Pro Gly Ser 
35 40 45 

Leu Phe Phe Ser Val Thr Thr Ala Pro Gin Leu Gin Pro Pro Gly Ser 
50 55 60 

Ser Arg Thr Leu Leu Thr Thr Thr Ala Ala Leu Trp Cys Pro Ser lie 
65 70 75 80 

Thr Ser Ser Thr Thr His Ser Leu Ser Ser Ala Ala His Thr Gly Thr 
85 90 95 

Thr Ser Val Leu Ser Leu Pro Ala Asn Lys Leu Thr Ser Leu Pro Val 
100 105 110 

15 Thr Ser Pro Ala Ser Arg Ser Pro Ser Ala Ser Pro Pro Arg Trp Glu 
115 120 125 

Leu Pro Thr Lys Gin Ser Leu Gly Phe Phe Lys Met Tyr Pro 
130 135 140 

(2) INFORMATION FOR SEQ ID N0:21: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 5100 base pairs 
20 (B) TYPE: nucleic acid 

<C) STRANDED NESS: double 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: 0NA (genomic) 



25 



(IX) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1246.. 1707 



(Xi) SEQUENCE DESCRIPTION: SEQ ID N0:21: 

CCTCATCAAA CAACCCGTGG TGGGCACCAC CCACGTGGAA ATGCCTCGCA ACGAAGTCCT 60 

AGAACAACAT CTGACCTCAC ATGGCGCTCA AATCGCGGGC GGAGGCGCTG CGGGCGATTA 120 

CTTTAAAAGC CCCACTTCAG CTCGAACCCT TATCCCGCTC ACCGCCTCCT GCTTAAGACC 180 

30 AGATGGAGTC TTTCAACTAG GAGGAGGCTC GCGTTCATCT TTCAACCCCC TGCAAACAGA 240 

TTTTGCCTTC CACGCCCTGC CCTCCAGACC GCGCCACGGG GGCATAGGAT CCAGGCAGTT 300 

TGTAGAGGAA TTTGTGCCCG CCGTCTACCT CAACCCCTAC TCGGGACCGC CGGACTCTTA 360 

TCCGGACCAG TTTATACGCC ACTACAACGT GTACAGCAAC TCTGTGAGCG GTTATAGCTG 420 

AGATTGTAAG ACTCTCCTAT CTGTCTCTGT GCTGCTTTTC CGCTTCAAGC CCCACAAGCA 480 

35 TGAAGGGGTT TCTGCTCATC TTCAGCCTGC TTGTGCATTG TCCCCTAATT CATGTTGGGA 540 

CCATTAGCTT CTATGCTGCA AGGCCCGGGT CTGAGCCTAA CGCGACTTAT GTTTGTGACT 600 

ATGGAAGCGA GTCAGATTAC AACCCCACCA CGGTTCTGTG GTTGGCTCGA GAGACCGATG 660 

GCTCCTGGAT CTCTGTTCTT TTCCGTCACA ACGGCTCCTC AACTGCAGCC CCCGGGGTCG 720 

TCGCGCACTT TACTGACCAC AACAGCAGCA TTGTGGTGCC CCAGTATTAC CTCCTCAACA 780 
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35 



ACTCACTCTC 


TAAGCTCTGC 


TGCTCATACC GGCACAACGA 


GCGTTCTCAG TTTACCTGCA 


840 


AACAAGCTGA 


CGTCCCTACC 


TGTCACGAGC CCGGCAAGCC 


GCTCACCCTC CGCGTCTCCC 


900 


CCGCGCTGG6 


AACTGCCCAC 


CAAGCAGTCA CTTGGTTTTT 


TCAAAATGTA CCCATAGCTA 


960 


CTGTTTACCG 


ACCTTGGGGC 


AATGTAACTT GGTTTTGTCC 


TCCCTTCATG TGTACCTTTA 


1020 


ATGTCAGCCT 


GAACTCCCTA 


CTTATTTACA ACTTTTCTGA 


CAAAACCGGG GGGCAATACA 


1080 


CAGCTCTCAT 


GCACTCCGGA 


CCTGCTTCCC TCTTTCAGCT 


CTTTAAGCCA ACGACTTGTG 


1140 


TCACCAAGGT 


GGAGGACCCG 


CCGTATGCCA ACGACCCGGC 


CTCGCCTGTG TGGCGCCCAC 


1200 


TGCTTTTTGC 


CTTCGTCCTC 


TGCACCGGCT GCGCGGTGTT 


GTTAA CCG CCT TCG 


1254 



Pro Pro Ser 
1 

10 GTC CAT CGA TTC TAT CCG GTA CCC GAA AGC TTA TCT CAG CCC GCT TTT 1302 
Val His Arg Phe Tyr Pro Val Pro Glu Ser Leu Ser Gin Pro Ala Phe 
5 10 15 

GGA GTC CCG AGC CCT ATA CCA CCC TCC ACT AAC AGT CCC CCC ATG GAG 1350 
Gly Val Pro Ser Pro He Pro Pro Ser Thr Asn Ser Pro Pro Met Glu 
20 25 30 35 

CCA GAC GGA GTT CAT GCC GAG CAG CAG TTT ATC CTC AAT CAG ATT TCC 1398 
Pro Asp Gly Val His Ala Glu Gin Gin Phe He Leu Asn Gin lie Ser 
15 40 45 50 

TGC GCC AAC ACT GCC CTC CAG CGT CAA AGG GAG GAA CTA GCT TCC CTT 1446 
Cys Ala Asn Thr Ala Leu Gin Arg Gin Arg Glu Glu Leu Ala Ser Leu 
55 60 65 

GTC ATG TTG CAT GCC TGT AAG CGT GGC CTC TTT TGT CCA GTC AAA ACT 1494 
Val Met Leu His Ala Cys Lys Arg Gly Leu Phe Cys Pro Val Lys Thr 
70 75 80 

20 TAC AAG CTC AGC CTC AAC GCC TCG GCC AGC GAG CAC AGC CTG CAC TTT 1542 
Tyr Lys Leu Ser Leu Asn Ala Ser Ala Ser Glu His Ser Leu His Phe 
85 90 95 

GAA AAA AGT CCC TCC CGA TTC ACC CTG GTC AAC ACT CAC GCC GGA GCT 1590 
Glu Lys Ser Pro Ser Arg Phe Thr Leu Val Asn Thr His Ala Gly Ala 
100 105 110 115 

TCT GTG CGA GTG GCC CTA CAC CAC CAG GGA GCT TCC GGC AGC ATC CGC 1638 
Ser Val Arg Val Ala Leu His His Gin Gly Ala Ser Gly Ser lie Arg 
25 120 125 130 

TGT TCC TGT TCC CAC GCC GAG TGC CTC CCC GTC CTC CTC AAG ACC CTC 1686 
Cys Ser Cys Ser His Ala Glu Cys Leu Pro Val Leu Leu Lys Thr Leu 
135 140 145 

TGT GCC TTT AAC TTT TTA GAT TAGCTGAAAG CAAATATAAA ATGGTGTGCT 1737 
Cys Ala Phe Asn Phe Leu Asp 
150 

30 



TACCGTAATT 


CTGTTTTGAC 


TTGTGTCCTT 


6ATTTCTCCC 


CCTGCGCCGT 


AATCCAGTGC 


1797 


CCCTCTTCAA 


AACTCTCGTA 


CCCTATGCGA 


TTCGCATAGG 


CATATTTTCT 


AAAAGCTCTG 


1857 


AAGTGAAGAT 


CACTCTCAAA 


CACTTCTCCG 


TTGTAGGTTA 


CTTTCATCTA 


CAGATAAAGT 


1917 


CATCCACCGG 


TTAACATCAT 


GAApAGAAGT 


GTGCCCCAGG 


ACTTTAATCT 


TGTGTATCCG 


1977 


TACAAGGCTA 


AGAGGCCCAA 


CATCATGCCG 


CCCTTTTTTG 


ACCGCAATGG 


CTTTGTTGAA 


2037 


AACCAAGAAG 


CCACGCTAGC 


CATGCTTGTG 


GAAAAGCCGC 


TCACGTTCGA 


CAAGGAAGGT 


2097 


GCGCTGACCC 


TGGGCGTCGG 


ACGCGGCATC 


CGCATTAACC 


CCGCGGGGCT 


T CTG GAGA CA 


2157 


AACGACCTCG 


CGTCCGCTGT 


CTTCCCACCG 


CTGGCCTCCG 


ATGAGGCCGG 


CAACGTCACG 


2217 


CTCAACATGT 


CTGACGGGCT 


ATATACTAAG 


GACAACAAGC 


TAGCTGTCAA 


AGTAGGTCCC 


2277 


GGGCTGTCCC 


TCGACTCCAA 


TAATGCTCTC 


CAGGTCCACA 


CAGGCGACGG 


GCTCACGGTA 


2337 
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ACCGATGACA AGGTGTCTCT AAATACCCAA GCTCCCCTCT CGACCACCAG CGCGGGCCTC 2397 

TCCCTACTTC TGGGTCCCAG CCTCCACTTA GGTGAGGAGG AACGACTAAC AGTAAACACC 2457 

GGAGCGGGCC TCCAAATTAG CAATAACGCT CTGGCCGTAA AAGTAGGTTC AGGTATCACC 2517 

GTAGATGCTC AAAACCAGCT CGCTGCATCC CTGGGGGACG GTCTAGAAAG CAGAGATAAT 2577 

5 AAAACTGTCG TTAAGGCTGG GCCCGGACTT ACAATAACTA ATCAAGCTCT TACTGTTGCT 2637 

ACCGGGAACG GCCTTCAGGT CAACCCGGAA GGGCAACTGC AGCTAAACAT TACTGCCGGT 2697 

CAGGGCCTCA ACTTTGCAAA CAACAGCCTC GCCGTGGAGC TGGGCTCGGG CCTGCATTTT 2757 

CCCCCTGGCC AAAACCAAGT AAGCCTTTAT CCCGGAGATG GAATAGACAT CCGAGATAAT 2817 

AGGGTGACTG TGCCCGCTGG GCCAGGCCTG AGAATGCTCA ACCACCAACT TGCCGTAGCT 2877 

10 TCCGGAGACG GTTTAGAAGT CCACAGCGAC ACCCTCCGGT TAAAGCTCTC CCACGGCCTG 2937 

ACATTTGAAA ATGGCGCCGT ACGAGCAAAA CTAGGACCAG GACTTGGCAC AGACGACTCT 2997 

GGTC6GTCCG TGGTTCGCAC AGGTCGAGGA CTTAGAGTTG CAAACGGCCA AGTCCAGATC 3057 

TTCAGCGGAA GAGGCACCGC CATCGGCACT GATAGCAGCC TCACTCTCAA CATCCGGGCG 3117 

CCCCTACAAT TTTCTGGACC CGCCTTGACT GCTAGTTTGC AAGGCAGTGG TCCGATTACT 3177 

15 TACAACAGCA ACAATGGCAC TTTCGGTCTC TCTATAGGCC CCGGAATGTG GGTAGACCAA 3237 

AACAGACTTC AGGTAAACCC AGGCGCTGGT TTAGTCTTCC AAGGAAACAA CCTTGTCCCA 3297 

AACCTTGCGG ATCCGCTGGC TATTTCCGAC AGCAAAATTA GTCTCAGTCT CGGTCCCGGC 3357 

CTGACCCAAG CTTCCAACGC CCTGACTT7A AGTTTAGGAA ACGGGCTTGA ATTCTCCAAT 3417 

CAAGCCGTTG CTATAAAAGC GGGCCGGGGC TTACGCTTTG AGTCTTCCTC ACAAGCTTTA 3477 

20 GA6AGCAGCC TCACAGTCGG AAATGGCTTA ACGCTTACCG ATACTGTGAT CCGCCCCAAC 3537 

CTAGGGGACG GCCTAGAGGT CAGAGACAAT AAAATCATTG TTAAGCTGGG CGCGAATCTT 3597 

CGTTTTGAAA ACGGAGCCGT AACCGCCGGC ACCGTTAACC CTTCTGCGCC CGAGGCACCA 3657 

CCAACTCTCA CTGCAGAACC ACCCCTCCGA GCCTCCAACT CCCATCTTCA ACTGTCCCTA 3717 

TCGGAGGGCT TGGTTGTGCA TAACAACGCC CTTGCTCTCC AACTGGGAGA CGGCATGGAA 3777 

25 GTAAATCAGC ACGGACTTAC TTTAAGAGTA GGCTCGGGTT TGCAAATGCG TGACGGCATT 3837 

TTAACAGTTA CACCCAGCGG CACTCCTATT GAGCCCAGAC TGACTGCCCC ACTGACTCAG 3897 

ACAGAGAATG GAATCGGGCT CGCTCTCGGC GCCGGCTTGG AATTAGACGA GAGCGCGCTC 3957 

CAAGTAAAAG TTGGGCCCGG CATGCGCCTG AACCCTGTAG AAAAGTATGT AACCCTGCTC 4017 

CTGGGTCCT6 CCCTTAGTTT TGGGCAGCCG GCCAACAGGA CAAATTATGA TGTGCGCGTT 4077 

30 TCTGTGGAGC CCCCCATGGT TTTCGGACAG CGTGGTCAGC TCACATTTTT AGTGGGTCAC 4137 

GGACTACACA TTCAAAATTC CAAACTTCAG CTCAATTTGG GACAAGGCCT CAGAACTGAC 4197 

CCCGTCACCA ACCAGCTGGA AGTGCCCCTC GGTGAAGGTT TGGAAATTGC AGACGAATCC 4257 

CAGGTTAGGG TTAAATTGGG CGATGGCCTG CAGTTTGATT CACAAGCTCG CATCACTACC 4317 

GCTCCTAACA TGGTCACTGA AACTCTGTGG ACCGGAACAG GCAGTAATGC TAATGTTACA 4377 

35 TGGCGGGGCT ACACTGCCCC CGGCAGCAAA CTCTTTTTGA GTCTCACTCG GTTCAGCACT 4437 

GGTCTAGTTT TAGGAAACAT GACTATTGAC AGCAATGCAT CCTTTGGGCA ATACATTAAC 4497 

GCGGGACACG AACAGATCGA ATGCTTTATA TTGTTGGACA ATCAGGGTAA CCTAAAAGAA 4557 

GGATCTAACT TGCAAGGCAC TTGGGAAGTG AAGAACAACC CCTCTGCTTC CAAAGCTGCT 4617 

TTTTTGCCTT CCACCGCCCT ATACCCCATC CTCAACGAAA GCCGAGGGAG TCTTCCTGGA 4677 
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AAAAATCTTG TGGGCATGCA AGCCATACTG GGAGGCGGGG GCACTTGCAC TGTGATAGCC 4737 

ACCCTCAATG GCAGACGCAG CAACAACTAT CCCGCGGGCC AGTCCATAAT TTTCGTGTGG 4797 

CAAGAATTCA ACACCATAGC CCGCCAACCT CTGAACCACT CTACACTTAC TTTTTCTTAC 4857 

TGGACTTAAA TAAGTTGGAA ATAAAGAGTT AAACTGAATG TTTAAGTGCA ACAGACTTTT 4917 

5 ATTGGTTTTG GCTCACAACA AATTACAACA GCATAGACAA GTCATACCGG 7CAAACAACA 4977 

CAGGCTCTCG AAAACGGGCT AACCGCTCCA AGAATCTGTC ACGCAGACGA GCAAGTCCTA 5037 

AATGTTTTTT CACTCTCTTC GGGGCCAAGT TCAGCATGTA TCGGATTTTC TGCTTACACC 5097 

TTT 5100 

(2) INFORMATION FOR SEQ ID NO: 22: 

10 

CO SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 154 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

15 Pro Pro Ser Val His Arg Phe Tyr Pro Val Pro Gtu Ser Leu Ser Gin 
15 10 15 

Pro Ala Phe Gly Val Pro Ser Pro lie Pro Pro Ser Thr Asn Ser Pro 
20 25 30 

Pro Met Glu Pro Asp Gly Val His Ala Glu Gin Gin Phe He Leu Asn 
35 40 45 

Gin He Ser Cys Ala Asn Thr Ala Leu Gin Arg Gin Arg Glu Glu Leu 
2 0 50 55 60 

Ala Ser Leu Val Met Leu His Ala Cys Lys Arg Gly Leu Phe Cys Pro 
65 70 75 80 

Val Lys Thr Tyr Lys Leu Ser Leu Asn Ala Ser Ala Ser Glu His Ser 
85 90 95 

Leu His Phe Glu Lys Ser Pro Ser Arg Phe Thr Leu Val Asn Thr His 
100 105 110 

Ala Gly Ala Ser Val Arg Val Ala Leu His His Gin Gly Ala Ser Gly 
115 120 125 

Ser He Arg Cys Ser Cys Ser His Ala Glu Cys Leu Pro Val Leu Leu 
130 135 140 

Lys Thr Leu Cys Ala Phe Asn Phe Leu Asp 
145 150 

30 (2) INFORMATION FOR SE0 ID NO: 23: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5100 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



25 



35 



(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME /KEY : COS 

(8) LOCATION: 1439. .1702 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
CCTCATCAAA CAACCCGTGG TGGGCACCAC CCACGTGGAA ATGCCTCGCA ACGAAGTCCT 



60 
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AGAACAACAT CTGACCTCAC ATGGCGCTCA AATCGCGGGC GGAGGCGCTG CGGGCGATTA 120 

CTTTAAAAGC CCCACTTCAG CTCGAACCCT TATCCCGCTC ACCGCCTCCT GCTTAAGACC 180 

AGATGGAGTC TTTCAACTAG GAGGAGGCTC GCGTTCATCT TTCAACCCCC TGCAAACAGA 240 

TTTTGCCTTC CACGCCCTGC CCTCCAGACC GCGCCACGGG GGCATAGGAT CCAGGCAGTT 300 

5 TGTAGAGGAA TTTGTGCCCG CCGTCTACCT CAACCCCTAC TCGGGACCGC CGGACTCTTA 360 

TCCGGACCAG TTTATACGCC ACTACAACGT GTACAGCAAC TCTGTGAGCG GTTATAGCTG 420 

AGATTGTAAG ACTCTCCTAT CTGTCTCTGT GCTGCTTTTC CGCTTCAAGC CCCACAAGCA 480 

TGAAGGGGTT TCTGCTCATC TTCAGCCTGC TTGTGCATTG TCCCCTAATT CATGTTGGGA 540 

CCATTAGCTT CTATGCTGCA AGGCCCGGGT CTGAGCC7AA CGCGACTTAT GTTTGTGACT 600 

10 ATGGAAGCGA GTCAGATTAC AACCCCACCA CGGTTCTGTG GTTGGCTCGA GAGACCGATG 660 

GCTCCTGGAT CTCTGTTCTT TTCCGTCACA ACGGCTCCTC AACTGCAGCC CCCGGGGTCG 720 

TCGCGCACTT TACTGACCAC AACAGCAGCA TTGTGGTGCC CCAGTATTAC CTCCTCAACA 780 

ACTCACTCTC TAAGCTCTGC TGCTCATACC GGCACAACGA GCGTTCTCAG TTTACCTGCA 840 

AACAAGCTGA CGTCCCTACC TGTCACGAGC CCGGCAAGCC GCTCACCCTC CGCGTCTCCC 900 

15 CCGCGCTGGG AACTGCCCAC CAAGCAGTCA CTTGGTTTTT TCAAAATGTA CCCATAGCTA 960 

CTGTTTACCG ACCTTGGGGC AATGTAACTT GGTTTTGTCC TCCCTTCATG TGTACCTTTA 1020 

ATGTCAGCCT GAACTCCCTA CTTATTTACA ACTTTTCTGA CAAAACCGGG GGGCAATACA 1080 

CAGCTCTCAT GCACTCCGGA CCTGCTTCCC TCTTTCAGCT CTTTAAGCCA ACGACTTGTG 1140 

TCACCAAGGT GGAGGACCCG CCGTATGCCA ACGACCCGGC CTCGCCTGTG TGGCGCCCAC 1200 

2 0 TGCTTTTTGC CTTCGTCCTC TGCACCGGCT GCGCGGTGTT GTTAACCGCC TTCGGTCCAT 1260 

CGATTCTATC CGGTACCCGA AAGCTTATCT CAGCCCGCTT TTGGAGTCCC GAGCCCTATA 1320 

CCACCCTCCA CTAACAGTCC CCCCATGGAG CCAGACGGAG TTCATGCCGA CCAGCAGTTT 1380 

ATCCTCAATC AGATTTCCTG CGCCAACACT GCCCTCCAGC GTCAAAGGGA GGAACTAG 1438 

CTT CCC TTG TCA TGT TGC ATG CCT GTA AGC CTG CCC TCT TTT GTC CAG 1486 
Leu Pro Leu Ser Cys Cys Met Pro Val Ser Val Ala Ser Phe Val Gin 
25 1 5 10 15 

TCA AAA CTT ACA AGC TCA GCC TCA ACG CCT CGG CCA GCG AGC ACA GCC 1534 
Ser Lys Leu Thr Ser Ser Ala Ser Thr Pro Arg Pro Ala Ser Thr Ala 
20 25 30 

TGC ACT TTG AAA AAA GTC CCT CCC GAT TCA CCC TGG TCA ACA CTC ACG 1582 
Cys Thr Leu Lys Lys Val Pro Pro Asp Ser Pro Trp Ser Thr Leu Thr 
35 40 45 

3 0 CCG GAG CTT CTG TGC GAG TGG CCC TAC ACC ACC AGG GAG CTT CCG GCA 1630 

Pro Glu Leu Leu Cys Glu Trp Pro Tyr Thr Thr Arg Glu Leu Pro Ala 
50 55 60 

GCA TCC GCT GTT CCT GTT CCC ACG CCG AGT GCC TCC CCG TCC TCC TCA 1678 
Ala Ser Ala Val Pro Val Pro Thr Pro Ser Ala Ser Pro Ser Ser Ser 
65 70 75 80 

AGA CCC TCT GTG CCT TTA ACT TTT TAGATTAGCT GAAAGCAAAT ATAAAATGGT 1732 
Arg Pro Ser Val Pro Leu Thr Phe 
35 85 

GTGCTTACCG TAATTCTGTT TTGACTTGTG TGCTTGATTT CTCCCCCTGC GCCGTAATCC 1792 

AGTGCCCCTC TTCAAAACTC TCGTACCCTA TGCGATTCGC ATAGGCATAT TTTCTAAAAG 1852 

CTCTGAAGTC AACATCACTC TCAAACACTT CTCCGTTGTA GGTTACTTTC ATCTACAGAT 1912 

AAAGTCATCC ACCGGTTAAC ATCATGAAGA GAAGTGTGCC CCAGGACTTT AATCTTGTGT 1972 
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ATCCGTACAA GGCTAAGAGG CCCAACATCA TGCCGCCCTT TTTTGACCGC AATGGCTTTG 2032 

TTGAAAACCA AGAAGCCACG CTAGCCATGC TTGTGGAAAA GCCGCTCACG TTCGACAAGG 2092 

AAGGTGCGCT GACCCTGGGC GTCGGACGCG GCATCCGCAT TAACCCCGCG GGGCTTCTGG 2152 

AGACAAACGA CCTCGCGTCC GCTGTCTTCC CACCGCTGGC CTCCGATGAG GCCGGCAACG 2212 

5 TCACGCTCAA CATGTCTGAC GGGCTATATA CTAAGGACAA CAAGCTAGCT 3TCAAAGTAG 2272 

GTCCCGGGCT GTCCCTCGAC TCCAATAATG C7CTCCAGGT CCACACAGGC GACGGGCTCA 2352 

CGGTAACCGA TGACAAGGTG TCTCTAAATA CCCAAGCTCC CCTCTCGACC ACCAGCGCGG 2392 

GCCTCTCCCT ACTTCTGGGT CCCAGCCTCC ACTTAGGTGA GGAGGAACGA CTAACAGTAA 2452 

ACACCGGAGC GGGCCTCCAA ATTAGCAATA ACGCTCTCGC CGTAAAAGTA GGTTCAGGTA 2512 

10 TCACCGTAGA TGCTCAAAAC CAGCTCGCTG CATCCCTGGG GGACGGTCTA GAAAGCAGAG 2572 

ATAATAAAAC TGTCGTTAAG GCTGGGCCCG GACTTACAAT AACTAATCAA GCTCTTACTG 2632 

TTGCTACCGG GAACGGCCTT CAGGTCAACC CGGAAGGGCA ACTGCAGCTA AACATTACTG 2692 

CCGGTCAGGG CCTCAACTTT GCAAACAACA GCCTCCCCGT G6AGCTGGGC TCGGGCCTGC 2752 

ATTTTCCCCC TGGCCAAAAC CAAGTAAGCC TTTATCCCGG AGATGGAATA GACATCCGAG 2812 

15 ATAA7AGGGT GACTGTGCCC GCTGGGCCAG GCCTGAGAAT GCTCAACCAC CAACTTGCCG 2872 

TAGCTTCCGG AGACGGTTTA GAAGTCCACA GCGACACCCT CCGGTTAAAG CTCTCCCACG 2932 

GCCTGACATT TGAAAATGGC GCCGTACGAG CAAAACTAGG ACCAGGACTT GGCACAGACG 2992 

ACTCTGGTCG GTCCGTGGTT CGCACAGGTC GAGGACTTAG AGTTGCAAAC GGCCAAGTCC 3052 

AGATCTTCAG CGGAAGAGGC ACCGCCATCG GCACTGATAG CAGCCTCACT CTCAACATCC 3112 

2 0 GGGCGCCCCT ACAATTTTCT GGACCCGCCT TGACTGCTAG TTTGCAAGGC AGTGGTCCGA 3172 

TTACTTACAA CAGCAACAAT GGCACTTTCG GTCTCTCTAT AGGCCCCGGA ATGTGGGTAG 3232 

ACCAAAACAG ACTTCAGGTA AACCCAGGCG CTGGTTTAGT CTTCCAAGGA AACAACCTTG 3292 

TCCCAAACCT TGCGGATCCG CTGGCTATTT CCGACAGCAA AATTAGTCTC AGTCTCGGTC 3352 

CCGGCCTGAC CCAAGCTTCC AACGCCCTGA CTTTAAGTTT AGGAAACGGG CTTGAATTCT 3412 

25 CCAATCAAGC CGTTGCTATA AAAGCGGGCC GGGGCTTACG CTTTGAGTCT TCCTCACAAG 3472 

CTTTAGAGAG CAGCCTCACA GTCGGAAATG GCTTAACGCT TACCGATACT GTGATCCGCC 3532 

CCAACCTAGG GGACGGCCTA GAGGTCAGAG ACAATAAAAT CATTGTTAAG CTGGGCGCGA 3592 

ATCTTCGTTT TGAAAACCGA GCCGTAACCG CCGGCACCGT TAACCCTTCT GCGCCCGAGG 3652 

CACCACCAAC TCTCACTGCA GAACCACCCC TCCGAGCCTC CAACTCCCAT CTTCAACTGT 3712 

30 CCCTATCGGA GGGCTTGGTT GTGCATAACA ACGCCCTTGC TCTCCAACTG GGAGACGGCA 3772 

TGGAAGTAAA TCAGCACGGA CTTACTTTAA GAGTAGGCTC GGGTTTGCAA ATGCGTGACG 3832 

GCATTTTAAC AGTTACACCC AGCGGCACTC CTATTGAGCC CAGACTGACT GCCCCACTGA 3892 

CTCAGACAGA GAATGGAATC GGGCTCGCTC TCGGCGCCGG CTTGGAATTA GACGAGAGCG 3952 

CGCTCCAAGT AAAAGTTGGG CCCGGCATGC GCCTGAACCC TGTAGAAAAG TATGTAACCC 4012 

35 TGCTCCTGGG TCCTGGCCTT AGTTTTGGGC AGCCGGCCAA CAGGACAAAT TATGATGTGC 4072 

GCGTTTCTGT GGAGCCCCCC ATGGTTTTCG GACAGCGTGG TCAGCTCACA TTTTTAGTGG 4132 

GTCACGGACT ACACATTCAA AATTCCAAAC TTCAGCTCAA TTTGGGACAA GGCCTCAGAA 4192 

CTGACCCCGT CACCAACCAG CTGGAAGTGC CCCTCGGTCA AGGTTTGGAA ATTGCAGACG 4252 

AATCCCAGGT TAGGGTTAAA TTGGGCGATG GCCTGCAGTT TGATTCACAA GCTCGCATCA 4312 
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10 



25 



35 



CTACCGCTCC 


TAACATGGTC 


ACTGAAACTC 


TGTGGACCGG AACAGGCAGT AATGCTAATG 


4372 


TTACATGGCG 


GGGCTACACT 


GCCCCCGGCA 


GCAAACTCTT TTTGAGTCTC ACTCGGTTCA 


4432 


GCACTGGTCT 


AGTTTTAGGA 


AACATGACTA 


TTGACAGCAA TGCATCCTTT GGGCAATACA 


4492 


TTAACGCGGG 


ACACGAACAG 


ATCGAATGCT 


TTATATTGTT GGACAATCAG GGTAACCTAA 


4552 


AAGAAGGATC 


TAACTTGCAA 


GGCACTTGGG 


AAGTGAAGAA CAACCCCTCT GCTTCCAAAG 


4612 


CTGCTTTTTT 


GCCTTCCACC 


GCCCTATACC 


CCATCCTCAA CGAAAGCCGA GGGAGTCTTC 


4672 


CTGGAAAAAA 


TCTTGTGGGC 


ATGCAAGCCA 


TACTGGGAGG CGGGGGCACT TGCACTGTGA 


4732 


TAGCCACCCT 


CAATGGCAGA 


CGCAGCAACA 


ACTATCCCGC GGGCCAGTCC ATAATTTTCG 


4792 


TGTGGCAAGA 


ATTCAACACC 


ATAGCCCGCC 


AACCTCTGAA CCACTCTACA CTTACTTTTT 


4852 


CTTACTGGAC 


TTAAATAAGT 


TGGAAATAAA 


GAGTTAAACT GAATGTTTAA GTGCAACAGA 


4912 


CTTTTATTGG 


TTTTGGCTCA 


CAACAAATTA 


CAACAGCATA GACAAGTCAT ACCGGTCAAA 


4972 


CAACACAGGC 


TCTCGAAAAC 


GGGCTAACCG 


CTCCAAGAAT CTGTCACGCA GACGAGCAAG 


5032 


TCCTAAATGT 


TTTTTCACTC 


TCTTCGGGGC 


CAAGTTCAGC ATGTATCGGA TTTTCTGCTT 


5092 


ACACCTTT 








5100 



15 

(2) INFORMATION FOR SEQ 10 N0:24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 88 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

20 (xi) SEQUENCE DESCRIPTION: SEQ ID N0:24: 

Leu Pro Leu Ser Cys Cys Met Pro Val Ser Val Ala Ser Phe Vat Gin 
15 10 15 

Ser Lys Leu Thr Ser Ser Ala Ser Thr Pro Arg Pro Ala Ser Thr Ala 
20 25 30 



Cys Thr Leu Lys Lys Val Pro Pro Asp Ser Pro Trp Ser Thr Leu Thr 
35 40 45 

Pro Glu Leu Leu Cys Glu Trp Pro Tyr Thr Thr Arg Glu Leu Pro Ala 
50 55 60 

Ala Ser Ala Val Pro Val Pro Thr Pro Ser Ala Ser Pro Ser Ser Ser 
65 70 75 80 

Arg Pro Ser Val Pro Leu Thr Phe 
85 

30 (2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

CA) LENGTH: 5100 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 
CD) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



(fx) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: 1915- -4863 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 
CCTCATCAAA CAACCCGTGG TGGGCACCAC CCACGTGGAA ATGCCTCGCA ACGAAGTCCT 



60 
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AGAACAACAT CTGACCTCAC ATGGCGCTCA AATCGCGGGC GGAGGCGCTG CGGGCGATTA 120 

CTTTAAAAGC CCCACTTCAG CTCGAACCCT TATCCCGCTC ACCGCCTCCT GCTTAAGACC 180 

AGATGGAGTC TTTCAACTAG GAGGAGGCTC GCGTTCATCT TTCAACCCCC TGCAAACAGA 240 

TTTTGCCTTC CACGCCCTGC CCTCCAGACC GCGCCACGGG GGCATAGGAT CCAGGCAGTT 300 

5 TGTAGAGGAA TTTGTGCCCG CCGTCTACCT CAACCCCTAC TCGGGACCGC CGGACTCTTA 360 

TCCGGACCAG TTTATACGCC ACTACAACGT GTACAGCAAC TCTGTGAGCG GTTATAGCTG 420 

AGATTGTAAG ACTCTCCTAT CTGTCTCTGT GCTGCTTTTC CGCTTCAAGC CCCACAAGCA 480 

TGAAGGGGTT TCTGCTCATC TTCAGCCTGC TTGTGCATTG TCCCCTAATT CATGTTGGGA 540 

CCATTAGCTT CTATGCTGCA AGGCCCGGGT CTGAGCCTAA CGCGACTTAT GTTTGTGACT 600 

10 ATGGAAGCGA GTCAGATTAC AACCCCACCA CGGTTCTGTG GTTGGCTCGA GAGACCGATG 660 

GCTCCTGGAT CTCTGTTCTT TTCCGTCACA ACGGCTCCTC AACTGCAGCC CCCGGGGTCG 720 

TCGCGCACTT TACTGACCAC AACAGCAGCA TTGTG6TGCC CCAGTATTAC CTCCTCAACA 780 

ACTCACTCTC TAAGCTCTGC TGCTCATACC GGCACAACGA GCGTTCTCAG TTTACCTGCA 840 

AACAA6CTGA CGTCCCTACC TGTCACGAGC CCGGCAAGCC GCTCACCCTC CGCGTCTCCC 900 

15 CCGCGCTGGG AACTGCCCAC CAAGCAGTCA CTTGGTTTTT TCAAAATGTA CCCATAGCTA 960 

CTGTTTACCG ACCTTGGGGC AATGTAACTT GGTTTTGTCC TCCCTTCATG TGTACCTTTA 1020 

ATGTCAGCCT GAACTCCCTA CTTATTTACA ACTTTTCTGA CAAAACCGGG GGGCAATACA 1080 

CAGCTCTCAT GCACTCCGGA CCTGCTTCCC TCTTTCAGCT CTTTAAGCCA ACGACTTGTG 1140 

TCACCAAGGT GGAGGACCCG CCGTATGCCA ACGACCCGGC CTCGCCTGTG TGGCGCCCAC 1200 

20 TGCTTTTTGC CTTCGTCCTC TGCACCGGCT GCGCGGTGTT GTTAACCGCC TTCGGTCCAT 1260 

CGATTCTATC CGGTACCCGA AAGCTTATCT CAGCCCGCTT TTGGAGTCCC GAGCCCTATA 1320 

CCACCCTCCA CTAACAGTCC CCCCATGGAG CCAGACGGAG TTCATGCCGA GCAGCAGTTT 1380 

ATCCTCAATC AGATTTCCTG CGCCAACACT GCCCTCCAGC GTCAAAGGGA GGAACTAGCT 1440 

TCCCTTGTCA TGTTGCATGC CTGTAAGCGT GGCCTCTTTT GTCCAGTCAA AACTTACAAG 1500 

25 CTCAGCCTCA ACGCCTCGGC CAGCGAGCAC AGCCTGCACT TTGAAAAAAG TCCCTCCCGA 1560 

TTCACCCTGG TCAACACTCA CGCCGGAGCT TCT6TGCGAG TGGCCCTACA CCACCAGGGA 1620 

GCTTCCGGCA GCATCCGCTG TTCCTGTTCC CACGCCGAGT GCCTCCCCGT CCTCCTCAAG 1680 

ACCCTCTGTG CCTTTAACTT TTTAGATTAG CTGAAAGCAA ATATAAAATG GTGTGCTTAC 1740 

CGTAATTCTG TTTTGACTTG TGTGCTTGAT TTCTCCCCCT GCGCCGTAAT CCAGTGCCCC 1800 

30 TCTTCAAAAC TCTCGTACCC TATGCGATTC GCATAGGCAT ATTTTCTAAA AGCTCTGAAG 1860 

TCAACATCAC TCTCAAACAC TTCTCCGTTG TAGGTTACTT TCATCTACA6 ATAA ACT 1917 

Ser 
1 

CAT CCA CCG GTT AAC ATC ATG AAG AGA AGT GTG CCC CAG GAC TTT AAT 1965 
His Pro Pro VaL Asn lie Met Lys Arg Ser Val Pro Gin Asp Phe Asn - . 
5 10 15 

35 CTT GTG TAT CCG TAC AAG GCT AAG AGG CCC AAC ATC ATG CCG CCC TTT 2013 
Leu Val Tyr Pro Tyr Lys Ala Lys Arg Pro Asn lie Met Pro Pro Phe 
20 25 30 

TTT GAC CGC AAT GGC TTT GTT GAA AAC CAA GAA GCC ACG CTA GCC ATG 2061 
Phe Asp Arg Asn Gly Phe Val Glu Asn Gin Glu Ala Thr Leu Ala Met 
35 40 45 

CTT GTG GAA AAG CCG CTC ACG TTC GAC AAG GAA GGT GCG CTG ACC CTG 2109 




Leu Val Glu Lys Pro Leu Thr Phe Asp Lys Glu GLy Ala 
50 55 60 



Leu Thr Leu 
65 



GGC GTC GGA CGC GGC ATC CGC ATT AAC CCC GCG GGG CTT CTG GAG ACA 

Gly Val Gly Arg Gly lie Arg lie Asn Pro Ala Gly Leu Leu Glu Thr 
70 75 80 

AAC GAC CTC GCG TCC GCT GTC TTC CCA CCG CTG GCC TCC GAT GAG GCC 

Asn Asp Leu Ala Ser Ala Val Phe Pro Pro Leu Ala Ser Asp Glu Ala 
85 90 95 

GGC AAC GTC ACG CTC AAC ATG TCT GAC GGG CTA TAT ACT AAG GAC AAC 

Gly Asn Val Thr Leu Asn Met Ser Asp Gly Leu Tyr Thr Lys Asp Asn 
100 105 110 



10 



AAG CTA GCT GTC AAA GTA GGT CCC GGG CTG TCC CTC GAC TCC AAT AAT 
Lys Leu Ala Val Lys Val Gly Pro Gly Leu Ser Leu Asp Ser Asn Asn 
115 120 125 

GCT CTC CAG GTC CAC ACA GGC GAC GGG CTC ACG GTA ACC GAT GAC AAG 
Ala Leu Gin Val His Thr Gly Asp Gly Leu Thr Val Thr Asp Asp Lys 
130 135 140 145 



GTG TCT CTA AAT ACC CAA GCT CCC CTC TCG ACC ACC AGC GCG GGC CTC 
Val Ser Leu Asn Thr Gin Ala Pro Leu Ser Thr Thr Ser Ala Gly Leu 
150 155 160 

TCC CTA CTT CTG GGT CCC AGC CTC CAC TTA GGT GAG GAG GAA CGA CTA 
15 Ser Leu Leu Leu Gly Pro Ser Leu His Leu Gly Glu Glu Glu Arg Leu 
165 170 175 



20 



ACA GTA AAC ACC GGA GCG GGC CTC CAA ATT AGC AAT AAC GCT CTG GCC 
Thr Val Asn Thr Gly Ala Gly Leu Gin tie Ser Asn Asn Ala Leu Ala 
180 185 190 

GTA AAA GTA GGT TCA GGT ATC ACC GTA GAT GCT CAA AAC CAG CTC GCT 
Val Lys Val Gly Ser Gly lie Thr Val Asp Ala Gin Asn Gin Leu Ala 
195 200 205 

GCA TCC CTG GGG GAC GGT CTA GAA AGC AGA GAT AAT AAA ACT GTC GTT 
Ala Ser Leu Gly Asp Gly Leu Glu Ser Arg Asp Asn Lys Thr Val Val 
210 215 220 225 



AAG GCT GGG CCC GGA CTT ACA ATA ACT AAT CAA GCT CTT ACT GTT GCT 
Lys Ala Gly Pro Gly Leu Thr lie Thr Asn Gin Ala Leu Thr Val Ala 
230 235 240 

ACC GGG AAC GGC CTT CAG GTC AAC CCG GAA GGG CAA CTG CAG CTA AAC 
25 Thr Gly Asn Gly Leu Gin Val Asn Pro Glu Gly Gin Leu Gin Leu Asn 
245 250 255 

ATT ACT GCC GGT CAG GGC CTC AAC TTT GCA AAC AAC AGC CTC GCC GTG 
lie Thr Ala Gly Gin Gly Leu Asn Phe Ala Asn Asn Ser Leu Ala Val 
260 265 270 



30 



GAG CTG GGC TCG GGC CTG CAT TTT CCC CCT GGC CAA AAC CAA GTA AGC 
Glu Leu Gly Ser Gly Leu His Phe Pro Pro Gly Gin Asn Gin Val Ser 
275 280 285 

CTT TAT CCC GGA GAT GGA ATA GAC ATC CGA GAT AAT AGG GTG ACT GTG 
Leu Tyr Pro Gly Asp Gly He Asp lie Arg Asp Asn Arg Val Thr Val 
290 295 300 305 



CCC GCT GGG CCA GGC CTG AGA ATG CTC AAC CAC CAA CTT GCC GTA GCT 
Pro Ala Gly Pro Gly Leu Arg Net Leu Asn His Gin Leu Ala Val Ala 
310 315 320 



TCC GGA GAC GGT TTA GAA GTC CAC AGC GAC ACC CTC CGG TTA AAG CTC 
35 Ser Gly Asp Gly Leu Glu Val His Set Asp Thr Leu Arg Leu Lys Leu 
325 330 335 

TCC CAC GGC CTG ACA TTT GAA AAT GGC GCC GTA CGA GCA AAA CTA GGA 
Ser His Gly Leu Thr Phe Glu Asn Gly Ala Val Arg Ala Lys Leu Gly 
340 345 350 

CCA GGA CTT GGC ACA GAC GAC TCT GGT CGG TCC GTG GTT CGC ACA GGT 
Pro Gly Leu Gly Thr Asp Asp Ser Gly Arg Ser Val Val Arg Thr Gly 
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2157 

2205 

2253 

2301 

2349 

2397 

2445 

2493 

2541 

2589 

2637 

2685 

2733 

2781 

2829 

2877 

2925 

2973 

3021 




355 



360 



365 



CGA GGA CTT AGA GTT GCA AAC GGC CAA GTC CAG ATC TTC AGC GGA AGA 
Arg Gly Leu Arg Val Ala Asn Gly Gin Val Gin He Phe Ser Gly Arg 
370 375 380 385 

GGC ACC GCC ATC GGC ACT GAT AGC AGC CTC ACT CTC AAC ATC CGG GCG 
Gly Thr Ala lie Gly Thr Asp Ser Ser Leu Thr Leu Asn lie Arg Ala 
5 390 395 400 

CCC CTA CAA TTT TCT GGA CCC GCC TTG ACT GCT ACT TTG CAA GGC AGT 
Pro Leu Gin Phe Ser Gly Pro Ala Leu Thr Ala Ser Leu Gin Gly Ser 
405 410 415 

GGT CCG ATT ACT TAC AAC AGC AAC AAT GGC ACT TTC GGT CTC TCT ATA 
Gly Pro lie Thr Tyr Asn Ser Asn Asn Gly Thr Phe Gly Leu Ser lie 
420 425 430 

10 GGC CCC GGA ATG TGG GTA GAC CAA AAC AGA CTT CAG GTA AAC CCA GGC 
Gly Pro Gly Met Trp Val Asp Gin Asn Arg Leu Gin Val Asn Pro Gly 
435 440 445 

GCT GGT TTA GTC TTC CAA GGA AAC AAC CTT GTC CCA AAC CTT GCG GAT 
Ala Gly Leu Val Phe Gin Gly Asn Asn Leu Val Pro Asn Leu Ala Asp 
450 455 460 465 

CCG CTG GCT ATT TCC GAC AGC AAA ATT AGT CTC AGT CTC GGT CCC GGC 
Pro Leu Ala lie Ser Asp Ser Lys He Ser Leu Ser Leu Gly Pro Gly 
15 470 475 480 

CTG ACC CAA GCT TCC AAC GCC CTG ACT TTA AGT TTA GGA AAC GGG CTT 
Leu Thr Gin Ala Ser Asn Ala Leu Thr Leu Ser Leu Gly Asn Gly Leu 
485 490 495 

GAA TTC TCC AAT CAA GCC GTT GCT ATA AAA GCG GGC CGG GGC TTA CGC 
Glu Phe Ser Asn Gin Ala Val Ala lie Lys Ala Gly Arg Gly Leu Arg 
500 505 510 

20 TTT GAG TCT TCC TCA CAA GCT TTA GAG AGC AGC CTC ACA GTC GGA AAT 
Phe Glu Ser Ser Ser Gin Ala Leu Glu Ser Ser Leu Thr Val Gly Asn 
515 520 525 

GGC TTA ACG CTT ACC GAT ACT GTG ATC CGC CCC AAC CTA GGG GAC GGC 
Gly Leu Thr Leu Thr Asp Thr Val He Arg Pro Asn Leu Gly Asp Gly 
530 535 540 545 

CTA GAG GTC AGA GAC AAT AAA ATC ATT GTT AAG CTG GGC GCG AAT CTT 
Leu Glu Val Arg Asp Asn Lys He lie Val Lys Leu Gly Ala Asn Leu 
2 5 550 555 560 

CGT TTT GAA AAC GGA GCC GTA ACC GCC GGC ACC GTT AAC CCT TCT GCG 
Arg Phe Glu Asn Gly Ala Val Thr Ala Gly Thr Vb Asn Pro Ser Ala 
565 570 575 

CCC GAG GCA CCA CCA ACT CTC ACT GCA GAA CCA CCC CTC CGA GCC TCC 
Pro Glu Ala Pro Pro Thr Leu Thr Ala Glu Pro Pro Leu Arg Ala Ser 
580 585 590 

30 AAC TCC CAT CTT CAA CTG TCC CTA TCG GAG GGC TTG GTT GTG CAT AAC 
Asn Ser His Leu Gin Leu Ser Leu Ser Glu Gly Leu Val Val His Asn 
595 600 605 



35 



AAC GCC CTT GCT CTC CAA CTG GGA GAC GGC ATG GAA GTA AAT CAG CAC 
Asn Ala Leu Ala Leu Gin Leu Gly Asp Gly Net Glu Val Asn Gin His 
610 615 620 625 

GGA CTT ACT TTA AGA GTA GGC TCG CGT TTG CAA ATG CGT GAC GGC ATT 
Gly Leu Thr Leu Arg Val Gly Ser Gly Leu Gin Net Arg Asp Gly He 
630 635 640 

TTA ACA GTT ACA CCC AGC GGC ACT CCT ATT GAG CCC AGA CTG ACT GCC 
Leu Thr Val Thr Pro Ser Gly Thr Pro lie Glu Pro Arg Leu Thr Ala 
645 650 655 



CCA CTG ACT CAG ACA GAG AAT GGA ATC GGG CTC GCT CTC GGC GCC GGC 
Pro Leu Thr Gin Thr Glu Asn Gly He Gly Leu Ala Leu Gly Ala Gly 
660 665 670 
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3117 
3165 
3213 
3261 
3309 
3357 
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3453 
3501 
3549 
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3645 
3693 
3741 
3789 
3837 
3885 
3933 
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TTG GAA TTA GAC GAG AGC GCG CTC CAA GTA AAA GTT GGG CCC GGC ATG 3981 
Leu Glu Leu Asp Glu Ser Ala Leu Gin Val Lys Val Gly Pro Gly Met 
675 680 685 

CGC CTG AAC CCT GTA GAA AAG TAT GTA ACC CTG CTC CTG GGT CCT GGC 4029 
Arg Leu Asn Pro Val Glu Lys Tyr Val Thr Leu Leu Leu Gly Pro Gly 
690 695 700 705 

5 CTT AGT TTT GGG CAG CCG GCC AAC AGG ACA AAT TAT GAT GTG CGC GTT 4077 
Leu Ser Phe Gly Gin Pro Ala Asn Arg Thr Asn Tyr Asp Val Arg Val 
710 715 720 

TCT GTG GAG CCC CCC ATG GTT TTC GGA CAG CGT GGT CAG CTC ACA TTT 4125 
Ser Val Glu Pro Pro Met Val Phe Gly Gin Arg Gly Gin Leu Thr Phe 
725 730 735 

T.TA GTG GGT CAC GGA CTA CAC ATT CAA AAT TCC AAA CTT CAG CTC AAT 4173 
Leu Val Gly His Gly Leu His lie Gin Asn Ser Lys Leu Gin Leu Asn 
10 740 745 750 

TTG GGA CAA GGC CTC AGA ACT GAC CCC GTC ACC AAC CAG CTG GAA GTG 4221 
Leu Gly Gin Gly Leu Arg Thr Asp Pro Val Thr Asn Gin Leu Glu Val 
755 760 765 

CCC CTC GGT CAA GGT TTG GAA ATT GCA GAC GAA TCC CAG GTT AGG GTT 4269 
Pro Leu Gly Gin Gly Leu Glu lie Ala Asp Glu Ser Gin Val Arg Val 
770 775 780 785 

15 AAA TTG GGC GAT GGC CTG CAG TTT GAT TCA CAA GCT CGC ATC ACT ACC 4317 
Lys Leu Gly Asp Gly Leu Gin Phe Asp Ser Gin Ala Arg lie Thr Thr 
790 795 800 

GCT CCT AAC ATG GTC ACT GAA ACT CTG TGG ACC GGA ACA GGC AGT AAT 4365 
Ala Pro Asn Met Val Thr Glu Thr Leu Trp Thr Gly Thr Gly Ser Asn 
805 810 815 

GCT AAT GTT ACA TGG CGG GGC TAC ACT GCC CCC GGC AGC AAA CTC TTT 4413 
Ala Asn Val Thr Trp Arg Gly Tyr Thr Ale Pro Gly Ser Lys Leu Phe 
20 820 825 830 

TTG AGT CTC ACT CGG TTC AGC ACT GGT CTA GTT TTA GGA AAC ATG ACT 4461 
Leu Ser Leu Thr Arg Phe Ser Thr Gly Leu Val Leu Gly Asn Met Thr 
835 840 845 

ATT GAC AGC AAT GCA TCC TTT GGG CAA TAC ATT AAC GCG GGA CAC GAA 4509 
lie Asp Ser Asn Ala Ser Phe Gly Gin Tyr He Asn Ala Gly His Glu 
850 855 860 865 

25 CAG ATC GAA TGC TTT ATA TTG TTG GAC AAT CAG GGT AAC CTA AAA GAA 4557 
Gin He Glu Cys Phe lie Leu Leu Asp Asn Gin Gly Asn Leu Lys Glu 
870 875 880 

GGA TCT AAC TTG CAA GGC ACT TGG GAA GTG AAG AAC AAC CCC TCT GCT 4605 
Gly Ser Asn Leu Gin Gly Thr Trp Glu Val Lys Asn Asn Pro Ser Ala 
885 890 895 

TCC AAA GCT GCT TTT TTG CCT TCC ACC GCC CTA TAC CCC ATC CTC AAC 4653 
Ser Lys Ala Ala Phe Leu Pro Ser Thr Ala Leu Tyr Pro lie Leu Asn 
3 0 900 905 910 

GAA AGC CGA GGG AGT CTT CCT GGA AAA AAT CTT GTG GGC ATG CAA GCC 4701 
Glu Ser Arg Gly Ser Leu Pro Gly Lys Asn Leu Val Gly Met Gin Ala 
915 920 925 

ATA CTG GGA GGC GGG GGC ACT TGC ACT GTG ATA GCC ACC CTC AAT GGC 4749 
He Leu Gly Gly Gly Gly Thr Cys Thr Val He Ala Thr Leu Asn Gly 
930 935 940 945 

35 AGA CGC AGC AAC AAC TAT CCC GCG GGC CAG TCC ATA ATT TTC GTG TGG 4797 
Arg Arg Ser Asn Asn Tyr Pro Ala Gly Gin Ser He He Phe Val Trp 
950 955 960 

CAA GAA TTC AAC ACC ATA GCC CGC CAA CCT CTG AAC CAC TCT ACA CTT 4845 
Gin Glu Phe Asn Thr He Ala Arg Gin Pro Leu Asn His Ser Thr Leu 
965 970 975 

ACT TTT TCT TAC TGG ACT TAAATAAGTT GGAAATAAAG AGTTAAACTG 4893 
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Thr Phe Ser Tyr Trp Thr 
980 

AATGTTTAAG TGCAACAGAC TTTTATTGGT TTTGGCTCAC AACAAATTAC AACAGCATAG 4953 
ACAAGTCATA CCGGTCAAAC AACACAGGCT CTCGAAAACG GGCTAACCGC TCCAAGAATC 5013 
TGTCACGCAG ACGAGCAAGT CCTAAATGTT TTTTCACTCT CTTCGGGGCC AAGTTCAGCA 5073 
TGTATCGGAT TTTCTGCTTA CACCTTT 5100 



(2) INFORMATION FOR SEQ ID NO:26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 983 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:26: 

Ser His Pro Pro Val Asn lie Met Lys Arg Ser Val Pro Gin Asp Phe 
15 10 15 

Asn Leu Val Tyr Pro Tyr Lys Ala Lys Arg Pro Asn lie Met Pro Pro 
20 25 30 

Phe Phe Asp Arg Asn Gly Phe Val Glu Asn Gin Glu Ala Thr Leu Ala 
35 40 45 

Met Leu Val Glu Lys Pro Leu Thr Phe Asp Lys Glu Gly Ala Leu Thr 
50 55 60 

Leu Gly Val Gly Arg Gly lie Arg ILe Asn Pro Ala Gly Leu Leu Glu 
65 70 75 80 

20 Thr Asn Asp Leu Ala Ser Ala Val Phe Pro Pro Leu Ala Ser Asp Glu 
85 90 95 

Ala Gly Asn Val Thr Leu Asn Met Ser Asp Gly Leu Tyr Thr Lys Asp 
100 105 110 

Asn Lys Leu Ala Val Lys Val Gly Pro Gly Leu Ser Leu Asp Ser Asn 
115 120 125 

Asn Ala Leu Gin Val His Thr Gly Asp Gly Leu Thr Val Thr Asp Asp 
25 130 135 140 

Lys Val Ser Leu Asn Thr Gin Ala Pro Leu Ser Thr Thr Ser Ala Gly 
145 150 155 160 

Leu Ser Leu Leu Leu Gly Pro Ser Leu His Leu Gly Glu Glu Glu Arg 
165 170 175 

Leu Thr Val Asn Thr Gly Ala Gly Leu Gin lie Ser Asn Asn Ala Leu 
180 185 190 

Ala Val Lys Val Gly Ser Gly lie Thr Val Asp Ala Gin Asn Gin Leu 
195 200 205 

Ala Ala Ser Leu Gly Asp Gly Leu Glu Ser Arg Asp Asn Lys Thr Val 
210 215 220 

Val Lys Ala Gly Pro Gly Leu Thr He Thr Asn Gin Ala Leu Thr Val 
225 230 235 240 

35 Ala Thr Gly Asn Gly Leu Gin Val Asn Pro Glu Gly Gin Leu Gin Leu 
245 250 255 

Asn lie Thr Ala Gly Gin Gly Leu Asn Phe Ala Asn Asn Ser Leu Ala 
260 265 270 

Val Glu Leu Gly Ser Gly Leu His Phe Pro Pro Gly Gin Asn Gin Val 
275 280 285 



30 
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Ser Leu Tyr Pro Gly Asp Gly lie Asp Me Arg Asp Asn Arg Val Thr 
290 295 300 

Val Pro Ala Gly Pro Gly Leu Arg Met Leu Asn His Gin Leu Ala Val 
305 310 315 320 

Ala Ser Gly Asp Gly leu Glu Val His Ser Asp Thr Leu Arg Leu Lys 
325 330 335 

5 

Leu Ser His Gly Leu Thr Phe Glu Asn Gly Ala Val Arg Ala Lys Leu 
340 345 350 

Gly Pro Gly Leu Gly Thr Asp Asp Ser Gly Arg Ser Val Val Arg Thr 
355 360 365 

Gly Arg Gly Leu Arg Val Ala Asn Gly Gin Val Gin He Phe Ser Gly 
370 375 380 

10 Arg Gly Thr Ala lie Gly Thr Asp Ser Ser Leu Thr Leu Asn lie Arg 
385 390 395 400 

Ala Pro Leu Gin Phe Ser Gly Pro Ala Leu Thr Ala Ser Leu Gin Gly 
405 410 415 

Ser Gly Pro lie Thr Tyr Asn Ser Asn Asn Gly Thr Phe Gly Leu Ser 
420 425 430 

He Gly Pro Gly Met Trp Val Asp Gin Asn Arg Leu Gin Val Asn Pro 
15 435 440 445 

Gly Ala Gly Leu Val Phe Gin Gly Asn Asn Leu Val Pro Asn Leu Ala 
450 455 460 

Asp Pro Leu Ala lie Ser Asp Ser Lys He Ser Leu Ser Leu Gly Pro 
465 470 475 480 

Gly Leu Thr Gin Ala Ser Asn Ala Leu Thr Leu Ser Leu Gly Asn Gly 
485 490 495 

20 

Leu Glu Phe Ser Asn Gin Ala Val Ala He Lys Ala Gly Arg Gly Leu 
500 505 510 

Arg Phe Glu Ser Ser Ser Gin Ala Leu Glu Ser Ser Leu Thr Val Gly 
515 520 525 

Asn Gly Leu Thr Leu Thr Asp Thr Val He Arg Pro Asn Leu Gly Asp 
530 535 540 

25 Gly Leu Glu Val Arg Asp Asn Lys lie He Val Lys Leu Gly Ala Asn 
545 550 555 560 

Leu Arg Phe Glu Asn Gly Ala Val Thr Ala Gly Thr Val Asn Pro Ser 
565 570 575 

Ale Pro Glu Ala Pro Pro Thr Leu Thr Ala Glu Pro Pro Leu Arg Ala 
580 585 590 

Ser Asn Ser His Leu Gin Leu Ser Leu Ser Glu Gly Leu Val Val His 
30 595 600 605 

Asn Asn Ala Leu Ala Leu Gin Leu Gly Asp Gly Met Glu Val Asn Gin 
610 615 620 

His Gly Leu Thr Leu Arg Val Gly Ser Gly Leu Gin Met Arg Asp Gly 
625 630 635 640 

He Leu Thr Val Thr Pro Ser Gly Thr Pro He Glu Pro Arg Leu Thr 
645 650 655 

Ala Pro Leu Thr Gin Thr Glu Asn Gly He Gly Leu Ala Leu Gly Ala 
660 665 670 

Gly Leu Glu Leu Asp Glu Ser Ala Leu Gin Val Lys Val Gly Pro Gly 
675 680 685 

Met Arg Leu Asn Pro Val Glu Lys Tyr Val Thr Leu Leu Leu Gly Pro 
690 695 700 



35 
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Gly Leu Ser Phe Gly Gin Pro Ate Asn Arg Thr Asn Tyr Asp Val Arg 
705 710 715 720 

Val Ser Val Glu Pro Pro Met Val Phe Gly Gin Arg Gly Gin Leu Thr 
725 730 735 

Phe Leu Val Gly His Gly Leu His He Gin Asn Ser Lys Leu Gin Leu 
740 745 750 

5 

Asn Leu Gly Gin Gly Leu Arg Thr Asp Pro Val Thr Asn Gin Leu Glu 
755 760 765 

Val Pro Leu Gly Gin Gly Leu Glu lie Ala Asp Glu Ser Gin Val Arg 
770 775 780 

Val Lys Leu Gly Asp Gly Leu Gin Phe Asp Ser Gin Ala Arg He Thr 
785 790 795 800 

10 Thr Ala Pro Asn Het Val Thr Glu Thr Leu Trp Thr Gly Thr Gly Ser 
805 810 815 

Asn Ala Asn Val Thr Trp Arg Gly Tyr Thr Ala Pro Gly Ser Lys Leu 
820 825 830 

Phe Leu Ser Leu Thr Arg Phe Ser Thr Gly Leu Val Leu Gly Asn Het 
835 840 845 

Thr lie Asp Ser Asn Ala Ser Phe Gly Gin Tyr He Asn Ala Gly His 
15 850 855 860 

Glu Gin lie Glu Cys Phe lie Leu Leu Asp Asn Gin Gly Asn Leu Lys 
865 870 875 880 

Glu Gly Ser Asn Leu Gin Gly Thr Trp Glu Val Lys Asn Asn Pro Ser 
885 890 895 



20 



Ala Ser Lys Ala Ala Phe Leu Pro Ser Thr Ala Leu Tyr Pro lie Leu 
900 905 910 

Asn Glu Ser Arg Gly Ser Leu Pro Gly Lys Asn Leu Val Gly Met Gin 
915 920 925 

Ala He Leu Gly Gly Gly Gly Thr Cys Thr Val He Ala Thr Leu Asn 
930 935 940 

Gly Arg Arg Ser Asn Asn Tyr Pro Ala Gly Gin Ser He He Phe Val 
945 950 955 960 

25 Trp Gin Glu Phe Asn Thr He Ala Arg Gin Pro Leu Asn His Ser Thr 
965 970 975 

Leu Thr Phe Ser Tyr Trp Thr 
980 

(2) INFORMATION FOR SEQ ID N0:27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 227 amino acids 
30 (B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(Xi) SEQUENCE DESCRIPTION: SEQ ID N0:27: 

35 Het Ser Lys Glu lie Pro Thr Pro Tyr Het Trp Ser Tyr Gin Pro Gin 

15 10 15 

Met Gly Leu Ala Ala Gly Ala Ala Gin Asp Tyr Ser Thr Arg He Asn 
20 25 30 

Tyr Het Ser Ala Gly Pro His Het He Ser Arg Val Asn Gly He Arg 
35 40 45 



WO 95/16048 W W PCT/CA94/00678 

-97- 

Ala His Arg Asn Arg lie Leu Leu Glu Gin Ala Ala lie Thr Thr Thr 
50 55 60 

Pro Arg Asn Asn Leu Asn Pro Arg Ser Trp Pro Ala Ala Leu Val Tyr 
65 70 75 80 

Gin Glu Ser Pro Ala Pro Thr Thr Val Val Leu Pro Arg Asp Ala Gin 
85 90 95 

5 

Ala Glu Val Gin Met Thr Asn Ser Gly Ala Gin Leu Ala Gly Gly Phe 
100 105 110 

Arg His Arg Val Arg Ser Pro Gly Gin Gly lie Thr His Leu Lys lie 
115 120 125 

. Arg Gly Arg Gly lie Gin Leu Asn Asp Glu Ser Val Ser Ser Ser Leu 
130 135 140 

10 Gly Leu Arg Pro Asp Gly Thr Phe Gin lie Gly Gly Ala Gly Arg Ser 

145 150 155 160 

Ser Phe Thr Pro Arg Gin Ala He Leu Thr Leu Gin Thr Ser Ser Ser 
165 170 175 

Glu Pro Arg Ser Gly Gly He Gly Thr Leu Gin Phe He Glu Glu Phe 
180 185 190 

Val Pro Ser Val Tyr Phe Asn Pro Phe Ser Gly Pro Pro Gly His Tyr 
15 195 ( 200 205 

Pro Asp Gin Phe He Pro Asn Phe Asp Ala Val Lys Asp Ser Ala Asp 
210 215 220 

Gly Tyr Asp 
225 

(2) INFORMATION FOR SEQ ID NO:28: 

20 <*> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 128 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



25 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 

Met Thr Asp Thr Leu Asp Leu Glu Met Asp Gly lie He Thr Glu Gin 
15 10 15 

Arg Leu Leu Glu Arg Arg Arg Ala Ala Ala Glu Gin Gin Arg Met Asn 
20 25 30 



30 



Gin Glu Leu Gin Asp Met Val Asn Leu His Gin Cys Lys Arg Gly lie 
35 40 45 

Phe Cys Leu Val Lys Gin Ala Lys Val Thr Tyr Asp Ser Asn Thr Thr 
50 55 60 

Gly His Arg Leu Ser Tyr Lys Leu Pro Thr Lys Arg Gin Lys Leu Val 
65 70 75 80 

Val Met Val Gly Glu Lys Pro lie Thr lie Thr Gin His Ser Val Glu 
85 90 95 

35 Thr Glu Gly Cys He His Ser Pro Cys Gin Gly Pro Glu Asp Leu Cys 

100 105 110 

Thr Leu He Lys Thr Leu Cys Gly Leu Lys Asp Leu He Pro Phe Asn 
115 120 125 



(2> INFORMATION FOR SEQ ID NO:29: 
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CO SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 582 amino acids 

(B) TYPE: emino acid 

(C) STRANDEDWESS: single 

(D) TOPOLOGY: linear 

(fi) KOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:29: 

Met Lys Arg Ala Arg Pro Ser Glu Asp Thr Phe Asn Pro Val Tyr Pro 
15 10 15 

Tyr Asp Thr Glu Thr Gly Pro Pro Thr VaL Pro Phe Leu Thr Pro Pro 
20 25 30 

10 Phe Val Ser Pro Asn Gly Phe Gin Glu Ser Pro Pro Gly Val Leu Ser 

35 40 45 

Leu Arg Val Ser Glu Pro Leu Asp Thr Ser His Gly Met Leu Ala Leu 
50 55 60 

Lys Met Gly Ser Gly Leu Thr Leu Asp Lys Ala Gly Asn Leu Thr Ser 
65 70 75 80 

Gin Asn Val Thr Thr Val Thr Gin Pro Leu Lys Lys Thr Lys Ser Asn 
15 85 90 95 

lie Ser Leu Asp Thr Ser Ala Pro Leu Thr lie Thr Ser Gly Ala Leu 
100 105 110 

Thr Val Ala Thr Thr Ala Pro Leu lie Val Thr Ser Gly Ala Leu Ser 
115 120 125 



20 



Val Gin Ser Gin Ala Pro Leu Thr Val Gin Asp Ser Lys Leu Ser lie 
130 135 140 

Ala Thr Lys Gly Pro He Thr Val Ser Asp Gly Lys Leu Ala Leu Gin 
145 150 155 160 

Thr Ser Ala Pro Leu Ser Gly Ser Asp Ser Asp Thr Leu Thr Val Thr 
165 170 175 

Ala Ser Pro Pro Leu Thr Thr Ala Thr Gly Ser Leu Gly He Asn Met 
180 185 190 

25 Glu Asp Pro lie Tyr Val Asn Asn Gly Lys He Gly lie Lys lie Ser 

195 200 205 

Gly Pro Leu Gin Vol Ala Gin Asn Ser Asp Thr Leu Thr Val Val Thr 
210 215 220 

Gly Pro Gly Val Thr Val Glu Gin Asn Ser Leu Arg Thr Lys Val Ala 
225 230 235 240 

Gly Ala lie Gly Tyr Asp Ser Ser Asn Asn Het Glu lie Lys Thr Gly 
30 245 250 255 

Gly Gly Met Arg lie Asn Asn Asn Leu Leu He Leu Asp Val Asp Tyr 
260 265 270 

Pro Phe Asp Ala Gin Thr Lys Leu Arg Leu Lys Leu Gly Gin Gly Pro 
275 280 285 



35 



Leu Tyr He Asn Ala Ser His Asn Leu Asp He Asn Tyr Asn Arg Gly 
290 295 300 

Leu Tyr Leu Phe Asn Ala Ser Asn Asn Thr Lys Lys Leu Glu Val Ser 
305 310 315 320 

He Lys Lys Ser Ser Gly Leu Asn Phe Asp Asn Thr Ala He Ala He 
325 330 335 

Asn Ala Gly Lys Gly Leu Glu Phe Asp Thr Asn Thr Ser Glu Ser Pro 
340 345 350 
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Asp He Asn Pro lie Lys Thr Lys lie Gly Ser Gly He Asp Tyr Asn 
355 360 365 

Glu Asn Gly Ala Het He Thr Lys Leu Gly Ala Gly Leu Ser Phe Asp 
370 375 380 

Asn Ser Gly Ala lie Thr He Gly Asn Lys Asn Asp Asp Lys Leu Thr 
385 390 395 400 

5 

Leu Trp Thr Thr Pro Asp Pro Ser Pro Asn Cys Arg lie His Ser Asp 
405 410 415 

Asn Asp Cys Lys Phe Thr Leu Val Leu Thr Lys Cys Gly Ser Gin Val 
420 425 430 

Leu Ala Thr Val Ala Ala Leu Ala Val Ser Gly Asp Leu Ser Ser Met 
435 440 445 

10 Thr Gly Thr Val Ala Ser Val Ser lie Phe Leu Arg Phe Asp Gin Asn 

450 455 460 

Gly Val Leu Het Glu Asn Ser Ser Leu Lys Lys His Tyr Trp Asn Phe 
465 470 475 480 

Arg Asn Gly Asn Ser Thr Asn Ala Asn Pro Tyr Thr Asn Ala Val Gly 
485 490 495 

Phe Met Pro Asn Leu Leu Ala Tyr Pro Lys Thr Gin Ser Gin Thr Ala 
15 500 505 510 

Lys Asn Asn lie Val Ser Gin Val Tyr Leu His Gly Asp Lys Thr Lys 
515 520 525 

Pro Met He Leu Thr He Thr Leu Asn Gly Thr Ser Glu Ser Thr Glu 
530 535 540 



20 



Thr Ser Glu Val Ser Thr Tyr Ser Met Ser Phe Thr Trp Ser Trp Glu 
545 550 555 560 

Ser Gly Lys Tyr Thr Thr Glu Thr Phe Ala Thr Asn Ser Tyr Thr Phe 
565 570 575 



Ser Tyr He Ala Gin Glu 
580 

(2) INFORMATION FOR SEQ ID N0:30: 

<i) SEQUENCE CHARACTERISTICS: 
25 (A) LENGTH: 7 amino acids 

(B) TYPE: amino acid 

(C) STRAND EDNESS: single 
CD) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(ix) FEATURE: 

(A) NAME/KEY: Modif ied-site 
30 (B) LOCATION: 2 

CD) OTHER INFORMATION: /note* "This position is X2.» 

(ix) FEATURE: 

CA) NAME/KEY: Modif ied-site 

CB) LOCATION: 4 

CD) OTHER INFORMATION: /note* "This position is X13." 

(ix) FEATURE: 

CA) NAME/KEY: Modif ied-site 
35 CB) LOCATION: 6 

CD) OTHER INFORMATION: /note' "This position is X2." 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:30: 

Cys Xaa Cys Xaa Cys Xaa Cys 
1 5 
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(2) INFORMATION FOR SEO ID NO:31: 

(O SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: amino acid 

CO STRANDEDNESS: single 
(D) TOPOLOGY: linear 

5 (ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEO ID NO:31: 

Gin Ser Ser Xaa Ser Thr Ser 
1 5 

(2) INFORMATION FOR SEQ ID NO:32: 

10 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

Cil) MOLECULE TYPE: peptide 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:32: 

Pro Leu Leu Phe Ala Phe Val Leu Cys Thr Gly Cys Ala Val Leu Leu 
1 5 10 15 

Thr Ala Phe Gly Pro Ser lie Leu Ser Gly Thr 
20 25 

(2) INFORMATION FOR SEQ ID N0:33; 

20 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 57 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



Cxi) SEQUENCE DESCRIPTION: SEQ ID N0:33: 

25 

Glu Glu Val Thr Ser His Phe Phe Leu Asp Cys Pro Glu Asp Pro Ser 
15 10 15 

Arg Glu Cys Ser Ser Cys Gly Phe His Gin Ala Gin Ser Gly He Pro 
20 25 30 

Gly He Met Cys Ser Leu Cys Tyr Met Arg Gin Thr Tyr His Cys He 
35 40 45 

30 Tyr Ser Pro Val Ser Glu Glu Glu Met 

50 55 

(2) INFORMATION FOR SEQ ID NO:34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

35 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 

Val Asp Leu Glu Cys His Glu Val Leu Pro Pro Ser 
1 5 10 
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Claims 



1- A live recombinant bovine adenovirus 
vector (BAV) system selected from the group consisting 
5 of: 

(a) a system wherein part or all of 
the El gene region is replaced by a heterologous 
nucleotide sequence encoding a foreign gene or 
fragment thereof; 

10 (b) a system wherein a part or all of 

the E3 gene region is replaced by a heterologous 
nucleotide sequence encoding a foreign gene or 
fragment thereof ; and 

(c) a system wherein part or all of 

15 the El gene region and part or all of the E3 gene 
region are deleted and a heterologous nucleotide 
sequence encoding a foreign gene or fragment thereof 
is inserted into at least one of the deletions. 

20 2. The BAV system of claim 1 which is a 

bovine adenovirus type 3 . 

3. The BAV system of claim 1 wherein (a) a 
recombinant BAV wherein part or all of the El gene 

25 region is replaced by a heterologous nucleotide 

sequence encoding a foreign gene or fragment thereof. 

4. The BAV system of claim 1 wherein (b) a 
recombinant BAV wherein a part or all of the E3 gene 

30 region is replaced by a heterologous nucleotide 

sequence encoding a foreign gene or fragment thereof. 

5. The BAV system of claim 1 wherein the 
foreign nucleotide sequence is with or without the 

35 control of an exogenous promoter. 

6. The BAV system of claim 1 wherein (c) a 
system wherein part or all of the El gene region and 
part or all of the E3 gene region are deleted and a 
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heterologous nucleotide sequence encoding a foreign 
gene or fragment thereof is inserted into at least one 
of the deletions. 

5 7. A recombinant vector system comprising 

the entire BAV genome and a plasmid capable of 
generating a recombinant virus by in vivo 
recombination following cotransf ection of a suitable 
cell line comprising the entire BAV genome 

10 representing the wild-type BAV genome and a plasmid 

comprising an adenovirus left end nucleotide sequences 
containing the E1A gene region or a plasmid comprising 
adenovirus right end sequences containing the E3 gene 
region, the plasmid with a heterologous nucleotide 

15 sequence encoding a foreign gene or fragment thereof 
substituted for part or all of the El and/ or E3 gene 
regions , respectively . 

8. A recombinant bovine adenovirus vector 
20 system comprising two plasmids capable of generating a 
recombinant virus by in vivo recombination following 
cotransfection of a cell line comprising 

(1) a first plasmid comprising the 
entire BAV genome except for a deletion of part or all 

25 of the El and/ or E3 gene regions, and 

(2) a second plasmid comprising BAV 
left or right end nucleotide sequences containing the 
El or E3 gene regions, respectively, having a 
heterologous nucleotide sequence encoding a foreign 

30 gene or fragment thereof inserted for the deletion of 
a part or all of the El or E3 gene regions. 

,9» A live viable recombinant bovine 
adenovirus (BAV) comprising a deletion of part or all 
35 of the El gene region, a deletion of part or all of 
the E3 gene region or deletion of both, and inserted 
into at least one deletion a heterologous nucleotide 
sequence coding for a polypeptide or an antigenic 
determinant produced by a disease causing organism. 
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10. A live viable recombinant bovine 
adenovirus (BAV) for producing an immune response in a 
mammalian host comprising: 

(1) a live bovine adenovirus (BAV) 

5 modified to contain a heterologous nucleotide sequence 
coding for a polypeptide or an antigenic determinant 
corresponding to the desired immune response in 
association with or without 

(2) an effective promoter for said 
10 nucleotide sequence to provide expression of said 

antigenic determinant in immunogenic non-pathogenic 
quantities. 



11. A live recombinant bovine adenovirus 
15 expression system comprising a deletion of all or part 

of the El gene region or all or part of the E3 gene 
region, or both deletions and inserted in at least one 
deletion a heterologous nucleotide sequence coding for 
a foreign gene or fragment thereof under control of an 
20 expression promoter with or without one or more 
polyadenylation signal . 

12. A recombinant mammalian cell line 
comprising bovine adenovirus (BAV) El gene region, 

25 said recombinant cell line thereby capable of allowing 
replication therein of a bovine adenovirus comprising 
an El deletion which may or may not be replaced by a 
heterologous or homologous nucleotide sequence 
encoding a foreign gene or fragment thereof. 

30 

13. The cell line of claim 12 which is a 
bovine cell line. 

14 . The recombinant mammalian cell line of 
35 claim 12 wherein the heterologous or homologous 

nucleotide sequence encoding the foreign gene or 
fragment thereof is selected from the group consisting 
of a bovine adenovirus (BAV) El polypeptide,, a BAV 
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El-associated polypeptide, a growth factor, a cellular 

receptor or other cellular polypeptide. 

15. A recombinant mammalian cell line 
5 comprising bovine adenovirus El genes, said 

recombinant cell line thereby capable of allowing DNA- 
mediated transfection to generate a recombinant bovine 
adenovirus (BAV) selected from the group consisting 
of: 

10 (a) a recombinant BAV wherein part or all of 

the El gene region is replaced by a heterologous 
nucleotide sequence encoding a foreign gene or 
fragment thereof, 

(b) a recombinant BAV wherein part or all of 
15 the E3 gene region is replaced by a heterologous 

nucleotide sequence encoding a foreign gene or 
fragment thereof, 

(c) a recombinant BAV wherein part or all of 
the El gene region and part or all of the E3 gene 

20 region are deleted and inserted into at least one 

deletion a heterologous nucleotide sequence encoding a 
foreign gene or fragment thereof, 

(d) a recombinant BAV wherein part or all of 
the El gene region and/or part or all of the E3 gene 

25 region are deleted and inserted into at least one 

deletion a heterologous nucleotide sequence encoding 
more than one foreign gene or fragment thereof to 
produce a recombinant fusion protein, and 

(e) a mutant BAV wherein part or all of the 
30 El gene region and/ or part or all of the E3 gene 

region are deleted. 



16. A method of preparing a recombinant 
polypeptide or fragment thereof which comprises: 
35 (1) infecting the mammalian cell line 

of claim 12, with a recombinant bovine adenovirus 
comprising a deletion of part or all of the El gene 
region and/or part or all of the E3 gene region and 
inserted into at least one deletion a heterologous 
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nucleotide sequence encoding the polypeptide or 
fragment thereof, 

(2) replicating the recombinant virus 
in a recombinant cell line under conditions to provide 
for expression of the polypeptide, and 

(3) recovering the recombinant 
polypeptide or antigenic fragment thereof, 

17 • A method of isolating a polypeptide 
which comprises: 

(1) replicating a recombinant 
mammalian cell line of claim 12 under conditions to 
provide for expression of the polypeptide, and 

(2) recovering the polypeptide or 
fragment thereof. 



18. A method for eliciting an immune 
response in a mammalian host to protect against an 
infection comprising: 

administering a vaccine composition 
comprising a live recombinant BAV of claim 1 wherein 
the foreign gene or fragment encodes an antigen with 
or without a pharmaceutically acceptable carrier. 

19. A method for eliciting an immune 
response in a mammalian host to protect against an 
infection comprising: 

administering a vaccine comprising a 
recombinant polypeptide or fragment thereof prepared 
by a method of claim 16 with or without a 
pharmaceutically acceptable carrier. 

20. A vaccine for protecting a mammalian 
host against infection comprising a live recombinant 
adenovirus of claim 1 wherein the foreign gene or 
fragment encodes an antigen with or without a 
pharmaceutically acceptable carrier. 
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21. A vaccine for protecting a mammalian 
} host against infection comprising a recombinant 

antigen prepared by a method of claim 16 with or 
without a pharmaceutical^ acceptable carrier. 

5 

22. A mutant bovine adenovirus (BAV) 
comprising a deletion of part or all of El and/ or a 
deletion of part or all of E3* 

10 23. A method for providing gene therapy to 

a mammal in need thereof to control a gene deficiency 
which comprises administering to said mammal a live 
recombinant bovine adenovirus containing a foreign 
nucleotide sequence encoding a non-defective form of 

15 said gene under conditions wherein the recombinant 
virus vector genome is incorporated into said 
mammalian genome or is maintained independently and 
extra chromosomally to provide expression of the 
required gene in the target organ or tissue. 

20 



25 



30 



35 
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Rb BINDING SEQUENCE 
Ad5 120 132 

IleAspLeuThrCysHisGlnAlaGlyPheProProSer 

: I I 111 I I I 

ValAspLeuGluCysHisGluVal LeuProProSer 

BAV3 26 3 7 



FIG. 2B 



Ad5 82 100 

LeuAspPheSerThrProGlyArgAlaAlaAlaAlaValAlaPheLeuSerPhelle 

II I I I I ill l II 

LeuAsp Thr Pr oGlyAr gVal Va 1 Al aAl aLeuAl aLeuLeuVa 1 Phe lie 

BAV3 83 99 



FIG. 3A 



Ad5 20 26 

GlnSerSerAsnSerThrSer 

III III 

GlnSerSerArgSerThrSer 

BAV3 136 142 



FIG. 3B 
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Ad2 MSKEIPTPYMWSYQPQMGLAAGAAQDYSTRINYMSAGPHMISRVNGIRAH 50 

BAV3 LIKQPWGTTHV — EMPRNEVLEQH 23 

Ad2 RNRILLEQAAITTTPRNNLNPRSWPAALVYQESPAPTTWLPRDAQAEVQ 100 

BAV3 LTSHGAQIAGGG AAGDYFKSPTSARTLIPLTASCL RPDG 62 

Ad2 MTNSGAQLAGGFRHRVRSPGQGITHLKIRGRGIQLNDESVSSSLGLRPDG 150 

BAV3 VFQLGGGSRSSFNP LQTDFAFHALPSRPRHGGIGSRQFVEEFVPAVYLNP 112 

• • : : : : ♦ : : 

Ad2 TFQIGGAGRSSFTPRQAILTLQTSSSEPRSGGIGTLQFIEEFVPSVYFNP 200 

BAV3 YSGPPDSYPDQFIRHYNVYSNSVSGYS 139 
• • • • • •»•••* • « % , 

Ad2 FSGPPGHYPDQFIPNFDAVKDSADGYD 227 

FIG. 8A 



3AV3 M EP DGVHAEQQF I LNQ I SCANTALQRQREELAS LVMLHACKRGL 77 

Ad5 mtdtldlemdgiiteqrll~-errraaaeqq^^ 4 8 

BAV3 FCPVKTYKLSLNASASEHSLHFEKSPSRFTLVNTHAGASVRVALHHQGAS 127 

Ad5 FC L VKQ AKVT YD S NTTGHRL S YKLP TKRQKLVVMVGEKP ITITQHSVETE 98 

BAV3 GS IRC5CSHAECLPVLLKTLCAFNFLD 154 

• • • •*»•• • 

Ad5 GCIHSPCQGPEDLCTLIKTLCGLKDLIPFN 128 

FIG. 8B 
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BAV3 - MKRS VPQD - - FNLVYP YKAKR PNIMPPFFDRNGFVENQEATLAML -43 

*••••• •»>•••• ••*•• I . • 

Ad2 - MKRARPSEDTFNPVYPYDTETGPPTVPFLTPPFVSPNGFQESPPGVLSLR -50 

BAV3 - VEKPLTFDKE - GALTLGVGRGIRINPAGLLETNDLASAVFPPLASDEAGN -92 

• »•»••... »«„ 

• • » • • a • • a * » « . a** 

Ad2 - VSEPL- -DTSHGMLALKMGSGLTLDKAGNLTSQNVTTV -86 

BAV3 - VTLNMSDGLYTKDNKLAVKVGPG -142 

Ad2 TQPLKKTKSNISLDTSAPLTI-TSGALTVATTAPLIWSGALSVQ -130 

BAV3 - TQAPLSTTSAGLSLLLGPSLHLGEEERLTVNTGAGLQISNNALAVKVGSG -192 

Ad2 - SQAPLT- — VQDSKLSIATKGPITVSDGKLALQTSAP -164 

BAV3 - I TVDAQNQLAASLGDGLE S RDNKTWKAGPGLT I TNQALTVATGNGLQVN -242 

Ad 2 - LSGSDSDTLTVT - : : ASPPLTTATGS-LGIN -191 

BAV3 - PEGQLQLNITAGQGLNFANNSLAVELGSGLHFPPGQNQVSLYPGDGIDIR -292 

i ■ • * * * 
Ad2 - MEDPIYVN NGKI GI KI SGPLQVAQ -215 

BAV3 - DNRVTVPAGPGLRMLNHQLAVASGDGLEVHSDTLRLKLSHGLTFENGAVR -342 

•*•* >>• • 

Ad2 - NSDTLTWTGPGVTVEQNSLR -236 

FIG. 8C-I 
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AKLGPGLGTDDSGRSWRTGRGLRVANGQVQIFSGRGTAIGTDSSLTLNI - 3 92 



BAV3 

Ad2 - TKVAGAIGYDSSNNMEIKTGGGMRINNNL- - LILDVDYPFDAQTKLRLKL -284 

BAV3 - RAPLQFSGPALTASLQGSGPITYNSNNGTFGLSIGPGNMVDQNRLQVNPG -442 

A d2 - GQGPLY INASHN LDINYN -302 

BAV3 - AGLVFQG1WLVPNLADPLAISDSKISLSLGPGLTQASNALTLSLGNGLEF -492 

Ad2 - RGLYL FNASNNTKKLEVS I KKS S GLNF -329 

BAV3 - SNQAVAIKAGRGLRFESSSQALESSLTVGNGLTLTDTVIRPNLGDGLEVR -542 

Ad2 - DNTAIAINAGKGLEFDTNT -348 

BAV3 - DNKIIVKLGANLRFENGAVTAGTVNPSAPEAPPTLTAEPPLRASNSHLQL -592 

Ad2 - : -348 

BAV3 - SLSEGLVVHNNALALQLGDGMEWQHGLTLRVGSGLQMRDGILTVTPSGT -642 

A d2 - SESPDIN- -PIKTKIGSGID YNENGA -372 

BAV3 - PIEPRLTAPLTQTENGIGLALGAGLELDESAliQVKVGPGMRLNPVEKYVT -692 

Ad2 - MIT- KLGAGLSFDNSG 387 



FIG. 8C-2 



BAV3 - LLLGPGLS FGQPANRTNYDVRVSVEPPMVFGQRGQLTFLVGHGLHI QNSK -742 

Ad2 AITIG NKNDDKLTLWTTPDPSP- NCR -412 

BAV3 - LQLNLGQGLRTDPVTNQLEVPLGQGLE I ADESQVRVKLGDGLQFDSQAR I -792 

Ad2 - IHSD NDCKFTLVLT KCGSQVLA -434 

BAV3 - TTAPNMVTETLVn^GTGSNANVTWRGYTAPGSKLFLSLTRFSTGLVLGNMT -842 

• • ■ • mm m ^ • ^ 

Ad2 - TVAALAVSGDLSSMTGTVASVS - - 1 FLRFDQ- -NGVLMENSS -472 

BAV3 - IDSNASFGQYINAGHEQIECFILLDNQGNLKEGSNLQGTWEVKNNPSASK -892 

* • * • 

• * • * • * » 

Ad2 - LKKHY — WNFRNGNS TNANPYTNA -494 

BAV3 - AAFLPSTALYP I LNESRGSLPGKNLVGMQAILGGGGTCTVI A- TLNGRRS -941 

Ad2 - VGFMPNLLAYP KTQSQTAKNNIVSQVYLHGDKTKPMILTITLNGTSE -541 

BAV3 - NNYPAGQS 1 1 FVWQ-EFNTIARQPLNHSTLTFSYWT -976 

Ad2 - STETSEVSTYSMSFTWSWESGKYTTETFATNSYTFSYIAQE -582 
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